encoding structurally unequal text versions in an apparatus

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

encoding structurally unequal text versions in an apparatus

ron.vandenbranden
Administrator
Hi,

We are creating an edition of a 20th-century novel, in which we're
encoding the textual variation between different versions with the
parallel-segmentation method. Some of these text witnesses are
"structurally unequal", in that they only contain parts of the entire text:
   -some witnesses only consist of a journal publication of a single
chapter of the novel
   -in other text witnesses, sections have been suppressed that were
present in the first print edition

For example, suppose this is the text structure for 3 text witnesses:
   -w1: part 1 (7 chapters), part 2 (2 chapters), appendix (4 poems)
   -w2: chapter 1 of part 1
   -w3: part 1, part 2

I am struggling to find an adequate encoding for these structural
differences. Since it could be a complicating factor, I should perhaps
state here that I am looking for a way to encode the apparatus as
explicitly as possible, meaning that I want to avoid "incomplete"
apparatus entries that silently omit text witnesses. In other words: for
validation purposes, I want to have a means to check that all <rdg>
elements refer to all <witness> elements that are relevant to the text
at that point (I thought I had seen a name for such explicitness in the
ML or elsewhere ("explicit" apparatus, "complete" apparatus) but can't
retrieve it).

I've considered a couple of options:

[1] Since verson 2.9.0, <rdg> allows a couple of chunk-level elements
such as <div> as children. If the structure of the text witnesses
outlined above would be "flattened" to //body/div (since <rdg> does not
allow front / body / back), that could yield a structure like this:

     <body>
       <!-- [part1]: in w1, w2, and w3 -->
       <div type="part" n="1">
         <!-- [part1][chapter1]: in w1, w2, and w3 -->
         <div type="chapter" n="1"><!-- ... --></div>
         <!-- [part1][chapters2-7]: only in w1 and w3 -->
         <app>
           <rdg wit="#w2"/>
           <rdg wit="#w1 #w3">
             <div type="chapter" n="2"><!-- ... --></div>
             <!-- ... -->
             <div type="chapter" n="7"><!-- ... --></div>
           </rdg>
         </app>
       </div>
       <!-- [part2]: only in w1 and w3-->
       <app>
         <rdg wit="#w1 #w3">
           <div type="part" n="2">
             <div type="chapter" n="1"><!-- ... --></div>
             <div type="chapter" n="2"><!-- ... --></div>
           </div>
         </rdg>
         <rdg wit="#w2/>
       </app>
       <!-- [appendix]: only in w1 -->
       <app>
         <rdg wit="#w1">
           <div type="appendix">
             <div type="poem" n="1"><!-- ... --></div>
             <!-- ... -->
             <div type="poem" n="4"><!-- ... --></div>
           </div>
         </rdg>
         <rdg wit="#w2 #w3"/>
       </app>
     </body>


Yet, I'm not fond of this irregular document structure, where similar
structural units appear at different levels, e.g.:
body/div[@type='part'] vs body/app/rdg/div[@type='part']. Moreover, I'm
wary of overly heavy <app> elements, containing possibly lots of nesting
<app> descendants.

[2] I've been trying to find a way to indicate what text witnesses are
relevant for what structural units in a text. I could think of @corresp
or even @ana on <div> elements, for pointing towards the relevant text
witnesses. This could greatly simplify the text structure:

     <front>
       <!-- [foreword]: only in w1 -->
       <div type="foreword" corresp="#w1"><!-- ... --></div>
     </front>
     <body>
       <!-- [part1]: in w1, w2, and w3 -->
       <div type="part" n="1" corresp="#w1 #w2 #w3">
         <!-- [part1][chapter1]: in w1, w2, and w3 -->
         <div type="chapter" n="1" corresp="#w1 #w2 #w3"><!-- ... --></div>
         <!-- [part1][chapters2-7]: only in w1 and w3 -->
         <div type="chapter" n="2" corresp="#w1 #w3"><!-- ... --></div>
         <!-- ... -->
         <div type="chapter" n="7" corresp="#w1 #w3"><!-- ... --></div>
       </div>
       <!-- [part2]: only in w1 and w3-->
       <div type="part" n="2" corresp="#w1 #w3">
         <div type="chapter" n="1"><!-- ... --></div>
         <div type="chapter" n="2"><!-- ... --></div>
       </div>
     </body>
     <back>
       <!-- [appendix]: only in w1 -->
       <div type="appendix" corresp="#w1">
         <div type="poem" n="1"><!-- ... --></div>
         <!-- ... -->
         <div type="poem" n="4"><!-- ... --></div>
       </div>
     </back>

This would permit a text structure that is both more faithful (with
front / body / back) and uniform (all structural units at the same
level). The @corresp (or @ana) mechanism then allows to determine the
"complete" set of relevant text witnesses for that textual unit, so the
validation and processing layers of the edition know that only
references to #w1 and #w3 should be expected in
//div[@type='part'][@n='2']. Yet, as always, I'm not sure if this is
proper use of @corresp. I'm even less sure about @ana, since text
witnesses in <witness> elements hardly qualify as interpretations, or do
they? Another attribute I have considered is @decls, but that should
identify one or more *declarable* elements within the header, which
<witness> is not. Moreover, the structural inequality now is reflected
by the @corresp (or @ana ) appearing at different structural levels and
hence complicating processing a bit, but not terribly, I think.

[3] Another option that crossed my mind was the fact that witnesses w2
and w3 could be considered fragmentary witnesses, whose range could be
flagged between <witStart/> and <witEnd/>. Yet, these elements can only
occur inside <rdg>, and the description in the Guidelines suggests that
this fragmentation is understood on a micro-level, and unapplicable to
this kind of "high-level structural fragmentation".

Therefore, I'm leaning towards option [2], perhaps with a dedicated
custom attribute if @corresp / @ana are unfit. I would appreciate any
thoughts or advice, and am curious to know how this is dealt with in
other projects.

Best,

Ron
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: encoding structurally unequal text versions in an apparatus

Franz Fischer-2
Dear Ron,

I think <witStart/> and <witEnd/> (option [3]) have been introduced
exactly to deal with this kind of situation and I do not see why they
should be applied on the micro-level only. Fragmentarity does not
necessarily imply that parts accidentally went missing and gaps follow
the textual structure only by chance.

I like option [2] but @corresp doesn't seem to me a suitable attribute,
unless the value, i.e. the reference to the corresponding witness
indicates the exact passage or range of text in the witess, that is,
something like w1.p1.c1 rather than simply #w1. But that means you refer
to an existing transcript with identifiable sections.

I agree with you about option [1], it's terrible.

But my favourite solution would be some structured declaration, stand
off, in the header or any other dedicated section of the file,
expressing presence and absence of all witnesses for all relevant parts
and passages of the work, refering to the id's of the respective
structural elements or milestones.

And the term you were looking for is probably a "positive" apparatus,
indicating both all confirming/accepted and all variant/rejected witness
readings.

Best,
Franz


Am 08.09.2016 um 23:32 schrieb Ron Van den Branden:

> Hi,
>
> We are creating an edition of a 20th-century novel, in which we're
> encoding the textual variation between different versions with the
> parallel-segmentation method. Some of these text witnesses are
> "structurally unequal", in that they only contain parts of the entire
> text:
>   -some witnesses only consist of a journal publication of a single
> chapter of the novel
>   -in other text witnesses, sections have been suppressed that were
> present in the first print edition
>
> For example, suppose this is the text structure for 3 text witnesses:
>   -w1: part 1 (7 chapters), part 2 (2 chapters), appendix (4 poems)
>   -w2: chapter 1 of part 1
>   -w3: part 1, part 2
>
> I am struggling to find an adequate encoding for these structural
> differences. Since it could be a complicating factor, I should perhaps
> state here that I am looking for a way to encode the apparatus as
> explicitly as possible, meaning that I want to avoid "incomplete"
> apparatus entries that silently omit text witnesses. In other words:
> for validation purposes, I want to have a means to check that all
> <rdg> elements refer to all <witness> elements that are relevant to
> the text at that point (I thought I had seen a name for such
> explicitness in the ML or elsewhere ("explicit" apparatus, "complete"
> apparatus) but can't retrieve it).
>
> I've considered a couple of options:
>
> [1] Since verson 2.9.0, <rdg> allows a couple of chunk-level elements
> such as <div> as children. If the structure of the text witnesses
> outlined above would be "flattened" to //body/div (since <rdg> does
> not allow front / body / back), that could yield a structure like this:
>
>     <body>
>       <!-- [part1]: in w1, w2, and w3 -->
>       <div type="part" n="1">
>         <!-- [part1][chapter1]: in w1, w2, and w3 -->
>         <div type="chapter" n="1"><!-- ... --></div>
>         <!-- [part1][chapters2-7]: only in w1 and w3 -->
>         <app>
>           <rdg wit="#w2"/>
>           <rdg wit="#w1 #w3">
>             <div type="chapter" n="2"><!-- ... --></div>
>             <!-- ... -->
>             <div type="chapter" n="7"><!-- ... --></div>
>           </rdg>
>         </app>
>       </div>
>       <!-- [part2]: only in w1 and w3-->
>       <app>
>         <rdg wit="#w1 #w3">
>           <div type="part" n="2">
>             <div type="chapter" n="1"><!-- ... --></div>
>             <div type="chapter" n="2"><!-- ... --></div>
>           </div>
>         </rdg>
>         <rdg wit="#w2/>
>       </app>
>       <!-- [appendix]: only in w1 -->
>       <app>
>         <rdg wit="#w1">
>           <div type="appendix">
>             <div type="poem" n="1"><!-- ... --></div>
>             <!-- ... -->
>             <div type="poem" n="4"><!-- ... --></div>
>           </div>
>         </rdg>
>         <rdg wit="#w2 #w3"/>
>       </app>
>     </body>
>
>
> Yet, I'm not fond of this irregular document structure, where similar
> structural units appear at different levels, e.g.:
> body/div[@type='part'] vs body/app/rdg/div[@type='part']. Moreover,
> I'm wary of overly heavy <app> elements, containing possibly lots of
> nesting <app> descendants.
>
> [2] I've been trying to find a way to indicate what text witnesses are
> relevant for what structural units in a text. I could think of
> @corresp or even @ana on <div> elements, for pointing towards the
> relevant text witnesses. This could greatly simplify the text structure:
>
>     <front>
>       <!-- [foreword]: only in w1 -->
>       <div type="foreword" corresp="#w1"><!-- ... --></div>
>     </front>
>     <body>
>       <!-- [part1]: in w1, w2, and w3 -->
>       <div type="part" n="1" corresp="#w1 #w2 #w3">
>         <!-- [part1][chapter1]: in w1, w2, and w3 -->
>         <div type="chapter" n="1" corresp="#w1 #w2 #w3"><!-- ...
> --></div>
>         <!-- [part1][chapters2-7]: only in w1 and w3 -->
>         <div type="chapter" n="2" corresp="#w1 #w3"><!-- ... --></div>
>         <!-- ... -->
>         <div type="chapter" n="7" corresp="#w1 #w3"><!-- ... --></div>
>       </div>
>       <!-- [part2]: only in w1 and w3-->
>       <div type="part" n="2" corresp="#w1 #w3">
>         <div type="chapter" n="1"><!-- ... --></div>
>         <div type="chapter" n="2"><!-- ... --></div>
>       </div>
>     </body>
>     <back>
>       <!-- [appendix]: only in w1 -->
>       <div type="appendix" corresp="#w1">
>         <div type="poem" n="1"><!-- ... --></div>
>         <!-- ... -->
>         <div type="poem" n="4"><!-- ... --></div>
>       </div>
>     </back>
>
> This would permit a text structure that is both more faithful (with
> front / body / back) and uniform (all structural units at the same
> level). The @corresp (or @ana) mechanism then allows to determine the
> "complete" set of relevant text witnesses for that textual unit, so
> the validation and processing layers of the edition know that only
> references to #w1 and #w3 should be expected in
> //div[@type='part'][@n='2']. Yet, as always, I'm not sure if this is
> proper use of @corresp. I'm even less sure about @ana, since text
> witnesses in <witness> elements hardly qualify as interpretations, or
> do they? Another attribute I have considered is @decls, but that
> should identify one or more *declarable* elements within the header,
> which <witness> is not. Moreover, the structural inequality now is
> reflected by the @corresp (or @ana ) appearing at different structural
> levels and hence complicating processing a bit, but not terribly, I
> think.
>
> [3] Another option that crossed my mind was the fact that witnesses w2
> and w3 could be considered fragmentary witnesses, whose range could be
> flagged between <witStart/> and <witEnd/>. Yet, these elements can
> only occur inside <rdg>, and the description in the Guidelines
> suggests that this fragmentation is understood on a micro-level, and
> unapplicable to this kind of "high-level structural fragmentation".
>
> Therefore, I'm leaning towards option [2], perhaps with a dedicated
> custom attribute if @corresp / @ana are unfit. I would appreciate any
> thoughts or advice, and am curious to know how this is dealt with in
> other projects.
>
> Best,
>
> Ron
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: encoding structurally unequal text versions in an apparatus

ron.vandenbranden
Administrator
Dear Franz,

Many thanks for your thoughts.


On 20/09/2016 13:53, Franz Fischer wrote:
> I agree with you about option [1], it's terrible.

Ok, great (to think alike)!

>
> I think <witStart/> and <witEnd/> (option [3]) have been introduced
> exactly to deal with this kind of situation and I do not see why they
> should be applied on the micro-level only. Fragmentarity does not
> necessarily imply that parts accidentally went missing and gaps follow
> the textual structure only by chance.
>

Thanks for clarifying, but still it would be awkward (from a processing
point of view) to mix fragment delimiters (<witStart/> / <witEnd/>) with
"regular" variants at all kinds of hierarchical levels in the text. So
I'm not considering this option for larger structural variation.

> I like option [2] but @corresp doesn't seem to me a suitable
> attribute, unless the value, i.e. the reference to the corresponding
> witness indicates the exact passage or range of text in the witess,
> that is, something like w1.p1.c1 rather than simply #w1. But that
> means you refer to an existing transcript with identifiable sections.

Such precise pointers would be required for the scenario where you'd
want to use @corresp (or a more appropriate attribute) to point from the
<witness> elements (or another stand-off mechanism, see later) to the
precise text structures they contain. But when pointing in the opposite
direction, i.e. from the text structures (in a single text with a
positive apparatus (thanks for the terminological help) for all text
witnesses) to the <witness> elements, I don't see why this wouldn't suffice:

   <TEI xmlns="http://www.tei-c.org/ns/1.0">
     <teiHeader>
       <fileDesc>
         <!-- ... -->
         <sourceDesc>
           <listWit>
             <witness xml:id="w1">witness 1</witness>
             <witness xml:id="w2">witness 2</witness>
             <witness xml:id="w3">witness 3</witness>
           </listWit>
         </sourceDesc>
         <!-- ... -->
       </fileDesc>
     </teiHeader>
     <text>
       <front>
         <div type="foreword" corresp="#w1" xml:id="fw"><!-- ... --></div>
       </front>
       <body>
         <div type="part" n="1" corresp="#w1 #w2 #w3" xml:id="p1">
           <div type="chapter" n="1" corresp="#w1 #w2 #w3" xml:id="p1.c1"><!-- ... --></div>
           <div type="chapter" n="2" corresp="#w1 #w3" xml:id="p1.c2"><!-- ... --></div>
           <!-- ... -->
           <div type="chapter" n="7" corresp="#w1 #w3" xml:id="p1.c7"><!-- ... --></div>
         </div>
         <div type="part" n="2" corresp="#w1 #w3" xml:id="p2">
           <div type="chapter" n="1" xml:id="p2.c1"><!-- ... --></div>
           <div type="chapter" n="2" xml:id="p2.c2"><!-- ... --></div>
         </div>
       </body>
       <back>
         <div type="appendix" corresp="#w1" xml:id="ap">
           <div type="poem" n="1" xml:id="ap.p1"><!-- ... --></div>
           <!-- ... -->
           <div type="poem" n="4" xml:id="ap.p4"><!-- ... --></div>
         </div>
       </back>
     </text>
   </TEI>

My reservation w.r.t. @corresp still holds: I'm well aware this might be
yet another stretch of its general-purpose semantics. But the principle
seems usable, at least.

>
> But my favourite solution would be some structured declaration, stand
> off, in the header or any other dedicated section of the file,
> expressing presence and absence of all witnesses for all relevant
> parts and passages of the work, refering to the id's of the respective
> structural elements or milestones.
>

So, that could be something like:

   <linkGrp type="witness-structure">
     <link corresp="#w1" target="#fw #p1 #p1.c1 #p1.c2 #p1.c7 #p2 #p2.c1 #p2.c2 #ap #ap.p1 #ap.p4"/>
     <link corresp="#w2" target="#p1.c1"/>
     <link corresp="#w3" target="#p1 #p1.c1 #p1.c2 #p1.c7 #p2 #p2.c1 #p2.c2"/>
   </linkGrp>

(somewhere in the document), or even:

   <listWit>
     <witness xml:id="w1" corresp="#fw #p1 #p1.c1 #p1.c2 #p1.c7 #p2 #p2.c1 #p2.c2 #ap #ap.p1 #ap.p4">witness 1</witness>
     <witness xml:id="w2" corresp="#p1.c1">witness2</witness>
     <witness xml:id="w3" corresp="#p1 #p1.c1 #p1.c2 #p1.c7 #p2 #p2.c1 #p2.c2">witness3</witness>
   </listWit>

When processing a certain text witness in an edition, this could
function as a lookup table to see what text structures need to be
included. Perhaps it could be simplified by reducing the references to
only the highest structural text units that are fully included in a
given text witness:

   <listWit>
     <witness xml:id="w1" corresp="#fw #p1 #p2 #ap">witness 1</witness>
     <witness xml:id="w2" corresp="#p1.c1">witness2</witness>
     <witness xml:id="w3" corresp="#p1 #p2">witness3</witness>
   </listWit>

Such a list makes it possible to determine for each element in the XML
document whether it should be:
     -included for a certain witness: the //witness/@corresp attribute
refers to the element, its ancestors or descendants
     -excluded for a certain witness: the //witness/@corresp attribute
does not refer to the element, its ancestors or descendants
Hence, checking the completeness of positive apparatuses with this
system is possible as well. I realize this could put a burden on the
processing layer by adding a lot of implied semantics about what
pointers should be included and how they should be processed, but as a
representation mechanism it seems to make sense. Again, perhaps @corresp
is not the best attribute, and perhaps other specialized attributes
could be coined (I could imagine e.g. a @wit-include and @wit-exclude
pair that could perhaps increase the granularity, though I fear it could
complicate processing even further); that's why I'm sticking to
<witness> as a place for expressing this information in my example. I'm
starting to like this option, actually.

Best,

Ron
Loading...