Encoding variants in structure

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Encoding variants in structure

Bert Van Elsacker
Dear All,

The last example in my previous message was incorrect and should have
looked like this:

version 1:      <p>This is a dummy sentence.</p>
                       <p>Das Ziel des Lebens ist der Tod.</p>

version 2: <p>This is a dummy sentence. Das Ziel des Lebens ist der Tod.</p>

<p id="p0.0">This is a dummy sentence.</p>
<app>
       <lem wit="V1">
               <p id="p1.0" next="p1.1"/>
       </lem>
       <rdg wit="V2">
               <note type="var">omitted</note>
       </rdg>
</app>
       <p id="p1.1" prev="p1.0">Das Ziel des Lebens
ist der Tod.</p>

(This method is inspired by the fragmentation examples in the TEI
Guidelines, ch. 31.) The idea is a) in the lemma text, to disregard
the end of paragraph p1.0 (because it has a value for the
next-attribute) and b) for all versions, to disregard the beginning of
paragraph p1.1 (because it has a value for the prev-attribute). Only
in version 1 a new paragraph starts after the first sentence, because
the lem-element contains a p-element without a prev-attribute.

(One could even add a p0.1 in the beginning of the lemma and give p0.0
a next-attribute with value p0.1, but that seems overkill.)

In answer to prof. O'Donnell, of course it's undesirable to collate
all typographical details. But sometimes some typographical features
in a specific author's work are significant and should be collated.
The question is which encoding to use in those cases.

Best regards,

Bert Van Elsacker

Reply | Threaded
Open this post in threaded view
|

Re: Encoding variants in structure

ron.vandenbranden
Administrator
Hi Bert,

At the Centre for Scholarly Editing and Document Studies (CTB) we've
encountered similar problems creating an electronic edition of the novel 'De
trein der traagheid' by the Flemish author Johan Daisne. That involved the
collation of 19 versions of the print history and encoding of their variants
using the parallel-segmentation method. It will soon result in a a CD-ROM
publication in which all different text versions can be visualised, and
dynamically compared with any other version(s). When versions are compared,
relevant variants are selected and paragraph-level entries are created to an
apparatus containing only the relevant versions for the relevant witnesses.

I think our approach lies in the middle of your question and the suggestions
Daniel O'Donnell's made in his reply. We've included these structural
variants in the transcription, but marked them so that they only appear in
the visualisation of the relevant version, yet are excluded from the apparatus.

As for the first problem, we have encoded variants that only involve visual
highlighting in this way:

<app>
  <rdg wit="V1" rend="caps" type="not">Das Ziel des Lebens ist der Tod.</rdg>
  <rdg wit="V2" type="not">Das Ziel des Lebens ist der Tod.</rdg>
</app>

... with "type='not'" to exclude them from the apparatus.

The other problem, consisting of shifting paragraph boundaries throughout
different print editions seemed harder to tackle. It faced us with the
problem of the markup (the means of representing structural / logical
structures) *itself* being subject to variation. In order to mark this, we
developed a milestone system of <anchor/> tags:
* <anchor type="insert"/> for paragraphs that were split regarding to the
orientation version
* <anchor type="joinstart"/> and <anchor type="joinend"/> for paragraphs
that were joined regarding to the orientation version

Supposing in your example version 1 is the orientation version, we would
have encoded it as:

<p>This is a dummy sentence.<anchor type="joinstart" corresp="V2"/></p>
<p><anchor type="joinend" corresp="V2"/>Das Ziel des Lebens ist der Tod.</p>

If version 2 would be the orientation version, we would have done it like this:

<p>This is a dummy sentence.<anchor type="insert" corresp="V1"/> Das Ziel
des Lebens ist der Tod.</p>

Based on this encoding, the paragraphs can be '(de)activated' in the
visualisation of the corresponding version, but don't show up in the apparatus.

This reduces markup, but of course puts a heavier burden on processing
(we've discovered in full the horrors of waking up milestone-delimited
hierarchies and the benefits of strict context control for their use). But
that seems inherent to dealing with overlapping hierarchies using XML + XSLT
at this point...

Ron

--
Ron Van den Branden
Wetenschappelijk attaché
Centrum voor Teksteditie en Bronnenstudie (CTB)
Koninklijke Academie voor Nederlandse Taal- en Letterkunde (KANTL)
Koningstraat 18 / b-9000 Gent / Belgium
e-mail : [hidden email]
http://www.kantl.be/ctb/staff/ron.htm

Reply | Threaded
Open this post in threaded view
|

Re: Encoding variants in structure

Bert Van Elsacker
In reply to this post by Bert Van Elsacker
Hi,

Variants in paragraph boundaries are quite important in the works of
W.F. Hermans, so we add them to the in-line apparatus (like Ron we use
parallel segmentation.)

We use a mixed system to encode variants in text structure. If the
extra paragraph break occurs in the preferred version, we use
fragmentation (as shown in my earlier post), because we like markup
outside apparatus elements and relating to text structure to be valid
for *all* versions, and we think the "next"- and "prev"-attributes on
an element make it very clear that it's end or beginning shouldn't be
taken into acount.

If the extra paragraph break occurs in another version, code is
limited to this construction:

<p>This is a dummy sentence.
<app>
<lem><note type="var">omitted</note></lem>
<rdg wit="v2"><note type="var">paragraph break</note></rdg>
</app>
 Das Ziel
des Lebens ist der Tod.</p>

The extra paragraph break inside the rdg will be handled by the
processing system. It has no consequences for the text structure of
the other versions and therefore there seems to be no need for
fragmentation.

Ron Van den Branden wrote:
>(we've discovered in full the horrors of waking up milestone-delimited
>hierarchies and the benefits of strict context control for their use).

For rebuilding a hierarchy encoded with milestone elements, the
grouping facilities in XSLT 2.0 come in very handy.