Document‐centric encoding

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Document‐centric encoding

Stefan Baums
Dear list,

I am working on material (fragmentary early Buddhist
manuscripts) that calls for an encoding close to the
physical level of the document, and have been using
as one of my guidelines the work of the Manuscripts
SIG on documents and genetic criticism:

   http://www.tei-c.org/SIG/Manuscripts/genetic.html
   http://www.tei-c.org/Activities/Council/Working/tcw19.html

This work seems to be about eight years old now. The
first page states that “[t]he Module has been
accepted in principle by the TEI Council in April
2010 and the module is now ready for testing.” I can
see that some of it (for instance the <line> element)
has indeed made it into the TEI Guidelines, while
other aspects (for instance the <document> element)
have not.

Is there some account somewhere of the criteria for
inclusion or otherwise of the SIG’s proposals, or
indeed any more recent work (design or application)
in this area?

One particular thing I am wondering about is how to
express the following situation. A text of two lines
is broken into two fragments A and B such that A
contains the beginnings of each line, B the end of
each line, and some part of the text in between (here
marked with + signs) is missing:

   A          B
   bla di + + bla blo
   da dim + + dum

It seems to me that the <damage> element does not
quite have the right meaning, since in the following

   <damage group="A">
     bla di
   </damage>
   <gap quantity="2"/>
   <damage group="B">
     bla blo
   </damage>
   <damage group="A">
     da dim
   </damage>
   <gap quantity="2"/>
   <damage group="B">
     dum
   </damage>

the bits enclosed by <damage> are damaged only in the
sense of being incomplete at the left or right edge.

All best,
Stefan Baums

--
Stefan Baums, Ph.D.
Institut für Indologie und Tibetologie
Ludwig‐Maximilians‐Universität München
Reply | Threaded
Open this post in threaded view
|

Re: Document‐centric encoding

Lou Burnard-6
The process by which the work of the SIG you refer to was incorporated
into the Guidelines is pretty exhaustively tracked in the discussions of
the TEI Council at the time: if you want to trace the ebb and flow of
discussion start with
http://www.tei-c.org/Activities/Council/Meetings/tcm45.xml !

But in practice I think almost everything in the SIG draft you're
referring to made it into the Guidelines, though with some changes. For
example the "document" element is in P5, but rejoices now in the name of
"sourceDoc".

In your example, I agree that <damage> is not appropriate. Your example
illustrates nicely why the new elements were considered useful. As soon
as you say "A text of two lines is broken into two fragments A and B
such that ..." it's clear that the object you want to describe can be
looked at from two perspectives: it is a (logical) text of two lines
(the first being "bla di ... bla blo") but it is also an (physical)
object of two fragments (A and B). You don't say whether these
"fragments" are shards of pottery or bits of parchment or what, but that
doesn't matter really:  what matters is that you are trying to describe
both the carrier (the fragments) and the material carried on it. In TEI,
you can use <text> for the latter and <sourceDoc> for the former. You
have to decide whether to make one or the other the dominant hierarchy
of your transcription, or provide both.

As an example of the latter choice, consider this:

<sourceDoc>
<zone>
<line>bla di</line>
<line>da dim</line>
</zone>
<zone>
<line>bla blo</line>
<line>dum</line>
</zone>
</sourceDoc>

<text>
<body>
<l>bla di <gap/> bla blo</l>
<l>da dim <gap/> dum </l>
</body>
</text>

(I've left out the detail you could supply to specify the alignment of
the <zone> elements with respect to each other and the <surface>
containing them, of course)

If you don't like the redundancy of this approach, there are various
hacks for reducing it : I leave it to others to suggest them.


On 11/01/18 16:11, Stefan Baums wrote:

> I am working on material (fragmentary early Buddhist
> manuscripts) that calls for an encoding close to the
> physical level of the document, and have been using
> as one of my guidelines the work of the Manuscripts
> SIG on documents and genetic criticism:
>
>     http://www.tei-c.org/SIG/Manuscripts/genetic.html
>     http://www.tei-c.org/Activities/Council/Working/tcw19.html
>
> This work seems to be about eight years old now. The
> first page states that “[t]he Module has been
> accepted in principle by the TEI Council in April
> 2010 and the module is now ready for testing.” I can
> see that some of it (for instance the <line> element)
> has indeed made it into the TEI Guidelines, while
> other aspects (for instance the <document> element)
> have not.
>
> Is there some account somewhere of the criteria for
> inclusion or otherwise of the SIG’s proposals, or
> indeed any more recent work (design or application)
> in this area?
>
> One particular thing I am wondering about is how to
> express the following situation. A text of two lines
> is broken into two fragments A and B such that A
> contains the beginnings of each line, B the end of
> each line, and some part of the text in between (here
> marked with + signs) is missing:
>
>     A          B
>     bla di + + bla blo
>     da dim + + dum
>
> It seems to me that the <damage> element does not
> quite have the right meaning, since in the following
>
>     <damage group="A">
>       bla di
>     </damage>
>     <gap quantity="2"/>
>     <damage group="B">
>       bla blo
>     </damage>
>     <damage group="A">
>       da dim
>     </damage>
>     <gap quantity="2"/>
>     <damage group="B">
>       dum
>     </damage>
>
> the bits enclosed by <damage> are damaged only in the
> sense of being incomplete at the left or right edge.
>
> All best,
> Stefan Baums
>
Reply | Threaded
Open this post in threaded view
|

Re: Document‐centric encoding

Stefan Baums
Dear Lou,

> http://www.tei-c.org/Activities/Council/Meetings/tcm45.xml

thank you for the pointer to the minutes...

> For example the "document" element is in P5, but rejoices now in the
> name of "sourceDoc".

... and for alerting me to this name change!

> it is a (logical) text of two lines (the first
> being "bla di ... bla blo") but it is also an
> (physical) object of two fragments (A and B)

Yes, and we want to encode both aspects of it, are
thinking of factoring them out into two sections, and
pondering how to link the material in both sections.

> You don't say whether these "fragments" are shards
> of pottery or bits of parchment or what

Various materials: scraps of birch bark and palm
leaf, pottery and metal shards, chips of wood.

> As an example of the latter choice, consider this:
>
> <sourceDoc>
> <zone>

One distinction that we try to draw is between
original surfaces and their original divisions (such
as columns), within which the text flows, and the
accidental divisions caused by damage, which cut
across the text. All the examples given for <zone> in
the Guidelines are for the former, but rereading its
definition it does sound like it could be used for
both, and we would then have to distinguish them by a
type attribute, and group all <zone>s belonging to
the same fragment using a suitable attribute (@group
does not seem to be allowed).

All best,
Stefan

--
Stefan Baums, Ph.D.
Institut für Indologie und Tibetologie
Ludwig‐Maximilians‐Universität München
Reply | Threaded
Open this post in threaded view
|

Re: Document‐centric encoding

James Cummings-5

Hi Stefan,


Re: Zone


I think it can indeed be used in the way you wish. To group zones that belong to the same surface you can use the <surface> element (or if a group of different surfaces the <surfaceGrp> element). Would that work in this case?


If there are not enough examples using it in the way you want, do feel free to submit additional examples at https://github.com/TEIC/TEI/issues since expanding the examples in the Guidelines is almost always a good thing.


Best wishes,

James 


--

Dr James Cummings, [hidden email]

School of English Literature, Language, and Linguistics, Newcastle University

Reply | Threaded
Open this post in threaded view
|

Re: Document‐centric encoding

Gerrit Brüning
In reply to this post by Stefan Baums
> Yes, and we want to encode both aspects of it, are thinking of factoring them
> out into two sections, and pondering how to link the material in both
> sections.

Dear Stefan,

If you are thinking about doing two transcriptions for the same source, this TEI-J-article may be of interest for you:
http://journals.openedition.org/jtei/697
I am not sure whether there is any other project where sourceDoc and text was put into practice.
In any case, note that a good deal of documentary information may well be recorded within a <text> based encoding, and that the "document-focussed" encoding (sourceDoc etc.) comes with some restrictions with regard to information that is regarded as too "interpretive". E.g., <subst> is not allowed within <line>.

Best,

Gerrit

---
Dr. Gerrit Brüning
Freies Deutsches Hochstift | Historisch-kritische Edition von Goethes Faust | beta.faustedition.net
Goethe-Universität Frankfurt am Main | Institut für deutsche Literatur und ihre Didaktik | IG-Hochhaus 1.155


> -----Original Message-----
> From: TEI (Text Encoding Initiative) public discussion list [mailto:TEI-
> [hidden email]] On Behalf Of Stefan Baums
> Sent: Thursday, January 11, 2018 8:19 PM
> To: [hidden email]
> Subject: Re: Document‐centric encoding
>
> Dear Lou,
>
> > http://www.tei-c.org/Activities/Council/Meetings/tcm45.xml
>
> thank you for the pointer to the minutes...
>
> > For example the "document" element is in P5, but rejoices now in the
> > name of "sourceDoc".
>
> ... and for alerting me to this name change!
>
> > it is a (logical) text of two lines (the first being "bla di ... bla
> > blo") but it is also an
> > (physical) object of two fragments (A and B)
>
> Yes, and we want to encode both aspects of it, are thinking of factoring them
> out into two sections, and pondering how to link the material in both
> sections.
>
> > You don't say whether these "fragments" are shards of pottery or bits
> > of parchment or what
>
> Various materials: scraps of birch bark and palm leaf, pottery and metal
> shards, chips of wood.
>
> > As an example of the latter choice, consider this:
> >
> > <sourceDoc>
> > <zone>
>
> One distinction that we try to draw is between original surfaces and their
> original divisions (such as columns), within which the text flows, and the
> accidental divisions caused by damage, which cut across the text. All the
> examples given for <zone> in the Guidelines are for the former, but
> rereading its definition it does sound like it could be used for both, and we
> would then have to distinguish them by a type attribute, and group all
> <zone>s belonging to the same fragment using a suitable attribute (@group
> does not seem to be allowed).
>
> All best,
> Stefan
>
> --
> Stefan Baums, Ph.D.
> Institut für Indologie und Tibetologie
> Ludwig‐Maximilians‐Universität München
Reply | Threaded
Open this post in threaded view
|

Re: Document‐centric encoding

Stefan Baums
Dear Gerrit,

thank you very much for the pointer to your article
in JTEI, which I read with pleasure and profit! I
found it interesting that in your last section, you
also seem to come to the conclusion that some
software support may be needed for the management of
multiple encodings of the same document/text to
ensure consistency. In our case, I do not think that
CollateX would be the solution, but we have been
thinking – on top of schema constraints in a good
editor – of a ‘lint’ tool to run over our files
periodically to check for dangling pointers and other
inconsistencies, plus the checks available in our
case by the planned roundtripping between TEI and a
relational database with well‐define structure.

All best,
Stefan

--
Stefan Baums, Ph.D.
Institut für Indologie und Tibetologie
Ludwig‐Maximilians‐Universität München
Reply | Threaded
Open this post in threaded view
|

Re: Document‐centric encoding

Stefan Baums
In reply to this post by James Cummings-5
Dear James,

> To group zones that belong to the same surface you
> can use the <surface> element (or if a group of
> different surfaces the <surfaceGrp> element). Would
> that work in this case?

I think that could indeed work well. But if we go
this way, then it may make more sense to use
<surface> within the documentary encoding to enclose
fragments than <zone>. I note with delight that
<surfaceGrp> can be nested, for something like

   <surfaceGrp> <!-- a manuscript -->
     <surfaceGrp> <!-- a folio of this manuscript -->
       <surface> <!-- a fragment of this folio -->
         <line>
           bla bla
         </line>
         <line>
           de dum
         </line>
       </surface>
       <surface> <!-- another fragment of this folio -->
         <line>
           ho hum
         </line>
       </surface>
     </surfaceGrp>
   </surfaceGrp>

> If there are not enough examples using it in the
> way you want, do feel free to submit additional
> examples at https://github.com/TEIC/TEI/issues
> since expanding the examples in the Guidelines is
> almost always a good thing.

Once we settle on our encoding and it finds approval,
we would love to do so.

All best,
Stefan

--
Stefan Baums, Ph.D.
Institut für Indologie und Tibetologie
Ludwig‐Maximilians‐Universität München