Nesting TEI and deprecation of teiCorpus

classic Classic list List threaded Threaded
42 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Nesting TEI and deprecation of teiCorpus

Scholger, Martina (martina.scholger@uni-graz.at)

Dear TEI Community,

 

We would like to request your feedback on the following issue:

 

Some time ago a proposal was submitted to introduce a new <standOff> element [1] (see https://github.com/laurentromary/stdfSpec) and a Council subgroup was assigned to examine the issue. The proposal permitted <teiHeader> as a child of the new <standOff> element, as the proposers thought it important to closely tie metadata to the stand-off annotations. The Council objected to this idea (of having <teiHeader> be a child of the new <standOff> since that would permit <teiHeader> in many unexpected places) and also wanted to explore the relationship between the stand-off annotations being recommended and the Open Annotation Data Model (http://www.openannotation.org/spec/core/).

 

We came up with the following solution to the metadata problem: in an upcoming release <TEI> will be a member of model.resourceLike, and thus be allowed to nest within <TEI>. The result is that a <teiHeader> can be closely bound to the <standOff> element by being a child of the same <TEI>, whether the <standOff> is used as a child of a nested <TEI> structure or a free-standing TEI document. The (desirable) side-effect of this is that <teiCorpus> is no longer needed. Therefore, we suggest to deprecate <teiCorpus> for a period of at least three years.  

 

Before we move on and deprecate <teiCorpus> we ask you to share your opinions on our suggestion. Please let us know if you see any substantial problem with removing <teiCorpus> in the long run and using nested <TEI> elements instead. This might allow something like:

 

====

<TEI xmlns="http://www.tei-c.org/ns/1.0">

  <teiHeader>

    <!-- Overall metadata for collection as a whole -->

  </teiHeader>

   

   <TEI n="text1">

      <teiHeader>

         <!-- Metadata for text1 -->

      </teiHeader>

      <text><!-- encoded transcription of text1 here --></text>

   </TEI>

 

   <TEI n="text2">

      <teiHeader>

         <!-- Metadata for text2 -->

      </teiHeader>

      <text><!-- encoded transcription of text2 here --></text>

   </TEI>

   <!-- more <TEI> elements -->

</TEI>

====


[1] <standOff> container element for stand-off annotations (comprising manual or machine-based annotations, as well as contextual information or linked data) embedded in a TEI document, see
https://hal.inria.fr/hal-01374102

 

 

Best wishes,

Martina Scholger

(on behalf of the TEI Technical Council)

 

 

 

Martina Scholger

Zentrum für Informationsmodellierung – Austrian Centre for Digital Humanities

Elisabethstraße 59/III

8010 Graz, Austria

[hidden email]

https://informationsmodellierung.uni-graz.at | https://gams.uni.graz.at

 

Chair of the TEI Technical Council | http://tei-c.org

Institut für Dokumentologie und Editorik e.V. | https://i-d-e.de

Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Roberto Rosselli Del Turco-2
Dear Martina,
from what you wrote nested <TEI> elements are fully equivalent to and a
more than acceptable substitution for <teiCorpus>, so no objection to
that on my part.

I would like a clarification about <standOff> though: it is not
immediately clear from your examples, but it would be placed at the same
hierarchical level than <teiHeader>, <facsimile> and <text>, is that
correct?

The linked slides still present <teiHeader> and <facsimile> besides
<listAnnotation> as permitted elements inside <standOff> (slide n. 13),
I guess that wouldn't be true any more? what would be the finalised
content model for <standOff>? also, is there some overlap between
<standOff> and <xenoData> if the former is going to hold Open Annotation
data in other formats? Apologies in advance if this is all detailed in
the spec, going to have a look later in the day.

Thank you in advance,

R

Il 28/06/19 10:36, Scholger, Martina ([hidden email]) ha
scritto:

> Dear TEI Community,
>
> We would like to request your feedback on the following issue:
>
> Some time ago a proposal was submitted to introduce a new <standOff>
> element [1] (see https://github.com/laurentromary/stdfSpec) and a
> Council subgroup was assigned to examine the issue. The proposal
> permitted <teiHeader> as a child of the new <standOff> element, as the
> proposers thought it important to closely tie metadata to the stand-off
> annotations. The Council objected to this idea (of having <teiHeader> be
> a child of the new <standOff> since that would permit <teiHeader> in
> many unexpected places) and also wanted to explore the relationship
> between the stand-off annotations being recommended and the Open
> Annotation Data Model (http://www.openannotation.org/spec/core/).
>
> We came up with the following solution to the metadata problem: in an
> upcoming release <TEI> will be a member of model.resourceLike, and thus
> be allowed to nest within <TEI>. The result is that a <teiHeader> can be
> closely bound to the <standOff> element by being a child of the same
> <TEI>, whether the <standOff> is used as a child of a nested <TEI>
> structure or a free-standing TEI document. The (desirable) side-effect
> of this is that <teiCorpus> is no longer needed. Therefore, we suggest
> to deprecate <teiCorpus> for a period of at least three years.
>
> Before we move on and deprecate <teiCorpus> we ask you to share your
> opinions on our suggestion. Please let us know if you see any
> substantial problem with removing <teiCorpus> in the long run and using
> nested <TEI> elements instead. This might allow something like:
>
> ====
>
> <TEI xmlns="http://www.tei-c.org/ns/1.0">
>
>    <teiHeader>
>
>      <!-- Overall metadata for collection as a whole -->
>
> </teiHeader>
>
>     <TEI n="text1">
>
>        <teiHeader>
>
> <!-- Metadata for text1 -->
>
>        </teiHeader>
>
>        <text><!-- encoded transcription of text1 here --></text>
>
> </TEI>
>
>     <TEI n="text2">
>
>        <teiHeader>
>
> <!-- Metadata for text2 -->
>
>        </teiHeader>
>
>        <text><!-- encoded transcription of text2 here --></text>
>
> </TEI>
>
>     <!-- more <TEI> elements -->
>
> </TEI>
>
> ====
>
>
> [1] <standOff> container element for stand-off annotations (comprising
> manual or machine-based annotations, as well as contextual information
> or linked data) embedded in a TEI document, see
> https://hal.inria.fr/hal-01374102
>
> Best wishes,
>
> Martina Scholger
>
> (on behalf of the TEI Technical Council)
>
> Martina Scholger
>
> Zentrum für Informationsmodellierung – Austrian Centre for Digital
> Humanities
>
> Elisabethstraße 59/III
>
> 8010 Graz, Austria
>
> [hidden email] <mailto:[hidden email]>
>
> https://informationsmodellierung.uni-graz.at 
> <https://informationsmodellierung.uni-graz.at/> |
> https://gams.uni.graz.at <https://gams.uni.graz.at/>
>
> Chair of the TEI Technical Council | http://tei-c.org
>
> Institut für Dokumentologie und Editorik e.V. | https://i-d-e.de 
> <https://i-d-e.de/>
>


--

Roberto Rosselli Del Turco   roberto.rossellidelturco at unito.it
Dip. di Studi Umanistici     roberto.rossellidelturco at fileli.unipi.it
Universita' di Torino        VBD: http://vbd.humnet.unipi.it/beta2/
EVT: http://bit.ly/24D9kdE   VC: http://www.visionarycross.org/

  Hige sceal the heardra,     heorte the cenre,
  mod sceal the mare,       the ure maegen litlath.  (Maldon 312-3)

<shamelessPlug>Holidays in Tuscany http://www.imoricci.it/</shamelessPlug>
lou
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

lou
In reply to this post by Scholger, Martina (martina.scholger@uni-graz.at)
Dear Martina

It's not clear to me  whether this change is an essential part of implementing the long-overdue standoff proposals, or simply a bright idea that has surfaced in the process of considering that task. If it is the former, why? If it is the latter,  is it really worth the disruption such a backwards-incompatible Birbaum-infringing change would introduce?

Lou

On Fri, 28 Jun 2019 at 09:37, Scholger, Martina ([hidden email]) <[hidden email]> wrote:

Dear TEI Community,

 

We would like to request your feedback on the following issue:

 

Some time ago a proposal was submitted to introduce a new <standOff> element [1] (see https://github.com/laurentromary/stdfSpec) and a Council subgroup was assigned to examine the issue. The proposal permitted <teiHeader> as a child of the new <standOff> element, as the proposers thought it important to closely tie metadata to the stand-off annotations. The Council objected to this idea (of having <teiHeader> be a child of the new <standOff> since that would permit <teiHeader> in many unexpected places) and also wanted to explore the relationship between the stand-off annotations being recommended and the Open Annotation Data Model (http://www.openannotation.org/spec/core/).

 

We came up with the following solution to the metadata problem: in an upcoming release <TEI> will be a member of model.resourceLike, and thus be allowed to nest within <TEI>. The result is that a <teiHeader> can be closely bound to the <standOff> element by being a child of the same <TEI>, whether the <standOff> is used as a child of a nested <TEI> structure or a free-standing TEI document. The (desirable) side-effect of this is that <teiCorpus> is no longer needed. Therefore, we suggest to deprecate <teiCorpus> for a period of at least three years.  

 

Before we move on and deprecate <teiCorpus> we ask you to share your opinions on our suggestion. Please let us know if you see any substantial problem with removing <teiCorpus> in the long run and using nested <TEI> elements instead. This might allow something like:

 

====

<TEI xmlns="http://www.tei-c.org/ns/1.0">

  <teiHeader>

    <!-- Overall metadata for collection as a whole -->

  </teiHeader>

   

   <TEI n="text1">

      <teiHeader>

         <!-- Metadata for text1 -->

      </teiHeader>

      <text><!-- encoded transcription of text1 here --></text>

   </TEI>

 

   <TEI n="text2">

      <teiHeader>

         <!-- Metadata for text2 -->

      </teiHeader>

      <text><!-- encoded transcription of text2 here --></text>

   </TEI>

   <!-- more <TEI> elements -->

</TEI>

====


[1] <standOff> container element for stand-off annotations (comprising manual or machine-based annotations, as well as contextual information or linked data) embedded in a TEI document, see
https://hal.inria.fr/hal-01374102

 

 

Best wishes,

Martina Scholger

(on behalf of the TEI Technical Council)

 

 

 

Martina Scholger

Zentrum für Informationsmodellierung – Austrian Centre for Digital Humanities

Elisabethstraße 59/III

8010 Graz, Austria

[hidden email]

https://informationsmodellierung.uni-graz.at | https://gams.uni.graz.at

 

Chair of the TEI Technical Council | http://tei-c.org

Institut für Dokumentologie und Editorik e.V. | https://i-d-e.de

Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Hugh Cayless-2
Bit of both, I think. And the question of whether to formally deprecate or just gently discourage the use of teiCorpus is what we hope the list will help answer. 

Hugh

Sent from my phone. 

On Jun 28, 2019, at 11:35, Lou Burnard <[hidden email]> wrote:

Dear Martina

It's not clear to me  whether this change is an essential part of implementing the long-overdue standoff proposals, or simply a bright idea that has surfaced in the process of considering that task. If it is the former, why? If it is the latter,  is it really worth the disruption such a backwards-incompatible Birbaum-infringing change would introduce?

Lou

On Fri, 28 Jun 2019 at 09:37, Scholger, Martina ([hidden email]) <[hidden email]> wrote:

Dear TEI Community,

 

We would like to request your feedback on the following issue:

 

Some time ago a proposal was submitted to introduce a new <standOff> element [1] (see https://github.com/laurentromary/stdfSpec) and a Council subgroup was assigned to examine the issue. The proposal permitted <teiHeader> as a child of the new <standOff> element, as the proposers thought it important to closely tie metadata to the stand-off annotations. The Council objected to this idea (of having <teiHeader> be a child of the new <standOff> since that would permit <teiHeader> in many unexpected places) and also wanted to explore the relationship between the stand-off annotations being recommended and the Open Annotation Data Model (http://www.openannotation.org/spec/core/).

 

We came up with the following solution to the metadata problem: in an upcoming release <TEI> will be a member of model.resourceLike, and thus be allowed to nest within <TEI>. The result is that a <teiHeader> can be closely bound to the <standOff> element by being a child of the same <TEI>, whether the <standOff> is used as a child of a nested <TEI> structure or a free-standing TEI document. The (desirable) side-effect of this is that <teiCorpus> is no longer needed. Therefore, we suggest to deprecate <teiCorpus> for a period of at least three years.  

 

Before we move on and deprecate <teiCorpus> we ask you to share your opinions on our suggestion. Please let us know if you see any substantial problem with removing <teiCorpus> in the long run and using nested <TEI> elements instead. This might allow something like:

 

====

<TEI xmlns="http://www.tei-c.org/ns/1.0">

  <teiHeader>

    <!-- Overall metadata for collection as a whole -->

  </teiHeader>

   

   <TEI n="text1">

      <teiHeader>

         <!-- Metadata for text1 -->

      </teiHeader>

      <text><!-- encoded transcription of text1 here --></text>

   </TEI>

 

   <TEI n="text2">

      <teiHeader>

         <!-- Metadata for text2 -->

      </teiHeader>

      <text><!-- encoded transcription of text2 here --></text>

   </TEI>

   <!-- more <TEI> elements -->

</TEI>

====


[1] <standOff> container element for stand-off annotations (comprising manual or machine-based annotations, as well as contextual information or linked data) embedded in a TEI document, see
https://hal.inria.fr/hal-01374102

 

 

Best wishes,

Martina Scholger

(on behalf of the TEI Technical Council)

 

 

 

Martina Scholger

Zentrum für Informationsmodellierung – Austrian Centre for Digital Humanities

Elisabethstraße 59/III

8010 Graz, Austria

[hidden email]

https://informationsmodellierung.uni-graz.at | https://gams.uni.graz.at

 

Chair of the TEI Technical Council | http://tei-c.org

Institut für Dokumentologie und Editorik e.V. | https://i-d-e.de

Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Sewell, David R. (drs2n)
In reply to this post by Scholger, Martina (martina.scholger@uni-graz.at)
We have legacy projects that generate document concatenations using teiCorpus.
If nested TEI elements were to be adopted, would it be sufficient simply to
rename "teiCorpus" as "TEI" to be conformant?

On Fri, 28 Jun 2019, Scholger, Martina ([hidden email]) wrote:

> Dear TEI Community,
>
>
> We would like to request your feedback on the following issue:
>
>
> Some time ago a proposal was submitted to introduce a new <standOff> element [1] (see https://github.com/laurentromary/stdfSpec) and a Council subgroup was assigned to examine the issue. The proposal permitted <teiHeader> as a child of the new <standOff> element, as the proposers thought it important to closely tie metadata to the stand-off annotations. The Council objected to this idea (of having <teiHeader> be a child of the new <standOff> since that would permit <teiHeader> in many unexpected places) and also wanted to explore the relationship between the stand-off annotations being recommended and the Open Annotation Data Model (http://www.openannotation.org/spec/core/).
>
>
> We came up with the following solution to the metadata problem: in an upcoming release <TEI> will be a member of model.resourceLike, and thus be allowed to nest within <TEI>. The result is that a <teiHeader> can be closely bound to the <standOff> element by being a child of the same <TEI>, whether the <standOff> is used as a child of a nested <TEI> structure or a free-standing TEI document. The (desirable) side-effect of this is that <teiCorpus> is no longer needed. Therefore, we suggest to deprecate <teiCorpus> for a period of at least three years.
>
>
> Before we move on and deprecate <teiCorpus> we ask you to share your opinions on our suggestion. Please let us know if you see any substantial problem with removing <teiCorpus> in the long run and using nested <TEI> elements instead. This might allow something like:
>
>
> ====
>
> <TEI xmlns="http://www.tei-c.org/ns/1.0">
>
>  <teiHeader>
>
>    <!-- Overall metadata for collection as a whole -->
>
>  </teiHeader>
>
>
>
>   <TEI n="text1">
>
>      <teiHeader>
>
>         <!-- Metadata for text1 -->
>
>      </teiHeader>
>
>      <text><!-- encoded transcription of text1 here --></text>
>
>   </TEI>
>
>
>   <TEI n="text2">
>
>      <teiHeader>
>
>         <!-- Metadata for text2 -->
>
>      </teiHeader>
>
>      <text><!-- encoded transcription of text2 here --></text>
>
>   </TEI>
>
>   <!-- more <TEI> elements -->
>
> </TEI>
>
> ====
>
> [1] <standOff> container element for stand-off annotations (comprising manual or machine-based annotations, as well as contextual information or linked data) embedded in a TEI document, see https://hal.inria.fr/hal-01374102
>
>
> Best wishes,
> Martina Scholger
> (on behalf of the TEI Technical Council)
>
>
>
> Martina Scholger
> Zentrum für Informationsmodellierung - Austrian Centre for Digital Humanities
> Elisabethstraße 59/III
> 8010 Graz, Austria
> [hidden email]<mailto:[hidden email]>
> https://informationsmodellierung.uni-graz.at<https://informationsmodellierung.uni-graz.at/> | https://gams.uni.graz.at<https://gams.uni.graz.at/>
>
> Chair of the TEI Technical Council | http://tei-c.org
> Institut für Dokumentologie und Editorik e.V. | https://i-d-e.de<https://i-d-e.de/>
>
--
David Sewell
Manager of Digital Initiatives
The University of Virginia Press
Email: [hidden email]   Tel: +1 434 924 9973
Web: http://www.upress.virginia.edu/rotunda
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Syd Bauman-10
Hi David!

Great question. No change should be needed at all to be conformant
against the new P5 that allows nesting <TEI>. But if the proposal to
remove the newly redundant <teiCorpus> is adopted,[1] then yes, I
expect that in the simple case (which likely will be the vast
majority), just renaming <teiCorpus> to <TEI> will be sufficient. I
suspect most folks will be able to get away with something like
  $ perl -p -e -i 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
or doing the equivalent in their text editor (a trivial task in
oXygen, Emacs, BBEdit, and dozens of others). Some folks will have
put the string "<teiCorpus" or "</teiCorpus" in CDATA marked
sections, PIs, or in comments, and thus may have to do more work to
exclude those, if desired. Some folks may have the string "teiCorpus"
(not preceded by '<' or '</') in comments or prose, and may have to
do more work to change those, if desired. (E.g., lots of TEI
documentation has "<gi>teiCorpus</gi>" in it.)

Some people may want to retain the information that what is now a
<TEI> element is a corpus, and thus might want to add type="corpus".
(This is for the folks who don't want to bother just testing the XPath
".//TEI" -- if true, then I'm a corpus <TEI>; if false, I am not.)
If you have no @type attributes on your <teiCorpus> elements, this is
pretty easy. If you already have @type attributes (or worse, both
@type and @subtype), you have to decide what to do with the existing
attributes.

Appended below is XSLT that I (Syd) whipped up to handle the generic
case. We (the TEI Council) expect to make a stylesheet like this
public if and when <teiCorpus> is deprecated, with a major difference
-- the one the Council makes public will have been carefully tested.
Mine is just slapped together, so use at your own risk. :-)

Changing an ODD file to support a corpus encoded with <TEI> instead
of <teiCorpus> should be quite easy. (Just rebuilding against the new
version of P5 should do the trick; but you probably want to remove
the <teiCorpus> element if you no longer have any of 'em.)

All that said, while changing instance documents to use <TEI> instead
of <teiCorpus> will be trivial, and changing ODD customizations or
XSLT (or XQuery) processes that read in documents that once had
<teiCorpus> and now have <TEI> should be easy, changing non-XML aware
software and project documentation and practices may not be so easy.

--------- begin <teiCorpus> -> <TEI> routine ---------
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  exclude-result-prefixes="#all"
  xmlns="http://www.tei-c.org/ns/1.0"
  xpath-default-namespace="http://www.tei-c.org/ns/1.0"
  version="3.0">

  <!-- <teiCorpus> to <TEI>; copyleft 2019 Syd Bauman and the
       Northeaster University Digital Scholarship Group.
       Barely tested; use at your own risk. -->
 
  <!-- Simple identity transform, but put in a newline between nodes
       that are children of root just for aesthetics. -->
  <xsl:template match="node()">
    <xsl:if test="not(ancestor::*)">
      <xsl:text>&#x0A;</xsl:text>
    </xsl:if>
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="@*">
    <xsl:copy/>
  </xsl:template>
 
  <xsl:template match="teiCorpus">
    <!-- If I am the outermost element and there are nodes (like the
         xml-model PIs), add a newline before me just for aesthetics.
         -->
    <xsl:if test="not(ancestor::*) and preceding::node()">
      <xsl:text>&#x0A;</xsl:text>
    </xsl:if>
    <!-- Output me (a <teiCorpus) as a <TEI> -->
    <xsl:element name="TEI">
      <!-- Values of @type and @subtype on output depend on what they
           were on input: -->
      <xsl:choose>
        <!-- Neither @type nor @subtype -->
        <xsl:when test="not( @type | @subtype )">
          <xsl:attribute name="type" select="'corpus'"/>
          <xsl:apply-templates select="@* | node()"/>
        </xsl:when>
        <!-- Only @type -->
        <xsl:when test="@type and not( @subtype )">
          <xsl:apply-templates select="@*"/>
          <xsl:attribute name="type" select="'corpus'"/>
          <xsl:attribute name="subtype" select="@type"/>
          <xsl:apply-templates select="node()"/>
        </xsl:when>
        <!-- @subtype (including both when there is and is not an @type) -->
        <xsl:when test="@subtype">
          <xsl:if test="not( @type )">
            <xsl:message select="'WARNING: Invalid attrs on &lt;teiCorpus> number '||count( preceding::teiCorpus )+1"/>
          </xsl:if>
          <xsl:apply-templates select="@*"/>
          <xsl:attribute name="type" select="'corpus'"/>
          <xsl:attribute name="subtype" select="@type||'.'||@subtype"/>
          <xsl:apply-templates select="node()"/>
        </xsl:when>
        <xsl:otherwise>
          <!-- Huh? That was all 4 possibilities. -->
          <xsl:message select="'ERROR: internal logic error on &lt;teiCorpus> number '||count( preceding::teiCorpus )+1"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:element>
  </xsl:template>
 
</xsl:stylesheet>
--------- end <teiCorpus> -> <TEI> routine ---------

Note
----
[1] And remember, if it is adopted, <teiCorpus> will not just
    disappear tomorrow; it will be deprecated for several years
    first, giving everyone plenty of time to make changes.

> We have legacy projects that generate document concatenations using
> teiCorpus. If nested TEI elements were to be adopted, would it be
> sufficient simply to rename "teiCorpus" as "TEI" to be conformant?
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Piotr Bański
In reply to this post by lou
Dear Martina and all,

Like Lou, I fail to clearly see the connection between the elimination
of teiCorpus and the standOff case. I also completely fail to see how
the proposed change helps the encoder (as opposed to the Council) in
handling the standoff-specific metadata locally.

For one thing, standOff and teiCorpus do not depend on one another in
any way.

For another, one standOff annotation layer per the nesting TEI element
is to be seen as a special case. When you go standoff, you often do that
because you need to encode more than one layer of annotation, and you
can't do that inside the vanilla TEI/text hierarchy without special tricks.

Standoff layers may be (and typically are) produced by different tools
(and recording that is relevant for reproducibility) and typically need
to document the annotation structure and content (tagsets evolve over
time). That is why having a header that is local to the standOff element
has always been so important. Eliminating teiCorpus and allowing TEI to
self-next does not improve matters in this respect at all.

Best regards,

   Piotr


On 6/28/19 12:35 PM, Lou Burnard wrote:

> Dear Martina
>
> It's not clear to me  whether this change is an essential part of
> implementing the long-overdue standoff proposals, or simply a bright
> idea that has surfaced in the process of considering that task. If it is
> the former, why? If it is the latter,  is it really worth the disruption
> such a backwards-incompatible Birbaum-infringing change would introduce?
>
> Lou
>
> On Fri, 28 Jun 2019 at 09:37, Scholger, Martina
> ([hidden email] <mailto:[hidden email]>)
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Dear TEI Community,____
>
>     __ __
>
>     We would like to request your feedback on the following issue:____
>
>     __ __
>
>     Some time ago a proposal was submitted to introduce a new <standOff>
>     element [1] (see https://github.com/laurentromary/stdfSpec) and a
>     Council subgroup was assigned to examine the issue. The proposal
>     permitted <teiHeader> as a child of the new <standOff> element, as
>     the proposers thought it important to closely tie metadata to the
>     stand-off annotations. The Council objected to this idea (of having
>     <teiHeader> be a child of the new <standOff> since that would permit
>     <teiHeader> in many unexpected places) and also wanted to explore
>     the relationship between the stand-off annotations being recommended
>     and the Open Annotation Data Model
>     (http://www.openannotation.org/spec/core/).____
>
>     __ __
>
>     We came up with the following solution to the metadata problem: in
>     an upcoming release <TEI> will be a member of model.resourceLike,
>     and thus be allowed to nest within <TEI>. The result is that a
>     <teiHeader> can be closely bound to the <standOff> element by being
>     a child of the same <TEI>, whether the <standOff> is used as a child
>     of a nested <TEI> structure or a free-standing TEI document. The
>     (desirable) side-effect of this is that <teiCorpus> is no longer
>     needed. Therefore, we suggest to deprecate <teiCorpus> for a period
>     of at least three years. ____
>
>     __ __
>
>     Before we move on and deprecate <teiCorpus> we ask you to share your
>     opinions on our suggestion. Please let us know if you see any
>     substantial problem with removing <teiCorpus> in the long run and
>     using nested <TEI> elements instead. This might allow something
>     like:____
>
>     __ __
>
>     ====____
>
>     <TEI xmlns="http://www.tei-c.org/ns/1.0">____
>
>        <teiHeader>____
>
>          <!-- Overall metadata for collection as a whole -->____
>
>     </teiHeader>____
>
>     ____
>
>         <TEI n="text1">____
>
>            <teiHeader>____
>
>     <!-- Metadata for text1 -->____
>
>            </teiHeader>____
>
>            <text><!-- encoded transcription of text1 here --></text>____
>
>     </TEI>____
>
>     __ __
>
>         <TEI n="text2">____
>
>            <teiHeader>____
>
>     <!-- Metadata for text2 -->____
>
>            </teiHeader>____
>
>            <text><!-- encoded transcription of text2 here --></text>____
>
>     </TEI>____
>
>         <!-- more <TEI> elements -->____
>
>     </TEI>____
>
>     ====____
>
>
>     [1] <standOff> container element for stand-off annotations
>     (comprising manual or machine-based annotations, as well as
>     contextual information or linked data) embedded in a TEI document,
>     see https://hal.inria.fr/hal-01374102____
>
>     __ __
>
>     __ __
>
>     Best wishes,____
>
>     Martina Scholger____
>
>     (on behalf of the TEI Technical Council)____
>
>     __ __
>
>     __ __
>
>     __ __
>
>     Martina Scholger____
>
>     Zentrum für Informationsmodellierung – Austrian Centre for Digital
>     Humanities____
>
>     Elisabethstraße 59/III____
>
>     8010 Graz, Austria____
>
>     [hidden email] <mailto:[hidden email]>____
>
>     https://informationsmodellierung.uni-graz.at
>     <https://informationsmodellierung.uni-graz.at/> |
>     https://gams.uni.graz.at <https://gams.uni.graz.at/>____
>
>     __ __
>
>     Chair of the TEI Technical Council | http://tei-c.org____
>
>     Institut für Dokumentologie und Editorik e.V. | https://i-d-e.de
>     <https://i-d-e.de/>____
>
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

C. M. Sperberg-McQueen
In reply to this post by Scholger, Martina (martina.scholger@uni-graz.at)
> On 28,Jun2019, at 2:36 AM, Scholger, Martina ([hidden email]) <[hidden email]> wrote:
>
> Dear TEI Community,
>  
> We would like to request your feedback on the following issue:
>  
> Some time ago a proposal was submitted to introduce a new <standOff> element [1] (see https://github.com/laurentromary/stdfSpec) and a Council subgroup was assigned to examine the issue. The proposal permitted <teiHeader> as a child of the new <standOff> element, as the proposers thought it important to closely tie metadata to the stand-off annotations. The Council objected to this idea (of having <teiHeader> be a child of the new <standOff> since that would permit <teiHeader> in many unexpected places) and also wanted to explore the relationship between the stand-off annotations being recommended and the Open Annotation Data Model (http://www.openannotation.org/spec/core/).
>  
> We came up with the following solution to the metadata problem: in an upcoming release <TEI> will be a member of model.resourceLike, and thus be allowed to nest within <TEI>. The result is that a <teiHeader> can be closely bound to the <standOff> element by being a child of the same <TEI>, whether the <standOff> is used as a child of a nested <TEI> structure or a free-standing TEI document. The (desirable) side-effect of this is that <teiCorpus> is no longer needed. Therefore, we suggest to deprecate <teiCorpus> for a period of at least three years.  
>  
> Before we move on and deprecate <teiCorpus> we ask you to share your opinions on our suggestion. Please let us know if you see any substantial problem with removing <teiCorpus> in the long run and using nested <TEI> elements instead.

The problems I see are unnecessary (and it seems to me pointless) backward incompatibility and the loss of an easy hook for better validation of corpora.

The change appears unnecessary because the existence of the teiCorpus element does not appear (on the face of it, based on the summary offered) to be doing any harm.

The change appears pointless because no rationale is offered, and the obvious rationale (we wish to deprecate and eliminate all non-essential elements) is so starkly at variance with the rest of the TEI design.

The change appears to make it harder to impose obvious validity requirements.  A project might wish to say, for example, that anything tagged as a ‘teiCorpus’ must contain at least two texts, or in some projects at least n texts, for n > 2.  If I were working on a corpus project I would almost certainly want to impose validity checks on the usage of various attributes invented for use in corpora.  This is straightforward using ’teiCorpus’ and ’TEI’ elements (assuming the project is happy with a relatively flat corpus structure); it is possible to do it in XSD and RNG with two contextually distinct declarations for ’TEI’, but pure-ODD schemas don’t support context-varying declarations so as far as I can see it’s impossible to do in pure ODD without the use of Schematron (which means in turn that schema-aware editors will find it impossible to use the constraint to guide the user in context).

How important one thinks these problems are is likely to depend on one’s views about backward compatibility, design consistency, and validation.  I do not have the impression that everyone interested in the TEI has the same views on those topics.  But speaking for myself, my reaction to the proposal is:  please don’t.

best,

Michael Sperberg-McQueen


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Syd Bauman-10
In reply to this post by Syd Bauman-10
CORRECTION:

>   $ perl -p -e -i 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]

Should, of course, read

    $ perl -p -i -e 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]

(I had inadvertently swapped the -i and -e switches ... order matters
here because the statement "s,abc,xyz,g;" is the argument to the -e
switch.)
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Peter Flynn-8
In reply to this post by Scholger, Martina (martina.scholger@uni-graz.at)
On 28/06/2019 09:36, Scholger, Martina ([hidden email]) wrote:
> Dear TEI Community,
> We would like to request your feedback on the following issue:
>
> Some time ago a proposal was submitted to introduce a new <standOff>
> element [1] (see https://github.com/laurentromary/stdfSpec)

Could someone please post on TEI-L a short (1–para) explanation of what
<standOff> is intended to do? I feel that it important that in future it
can be found in the TEI-L archives directly, rather than having to
fossick through a (potentially non-existent) github site.

I cannot see anything in the github site than explains what it's for (my
apologies if I have missed something here). The INRIA paper explains soe
of it quite well, so perhaps an edited form of that Abstract would do.

Essentially, <standOff> appears mostly harmless, but I cannot see why
implementing it should involve the deprecation of the TEI's outermost
container (again, I am probably missing something, but I cannot see what).

> [snip] The (desirable) side-effect of this is that <teiCorpus> is no
> longer needed. Therefore, we suggest to deprecate <teiCorpus> for a
> period of at least three years.

I think before we do this, we explain some aspects more clearly:

a. why is it desirable to deprecate <teiCorpus>? I am missing the
argument for this: the statement appears to be a non-sequitur as it
currently stands.

b. independently, why is <teiCorpus> no longer needed? If I am creating
a corpus as a monolithic document, I will need a container for multiple
<TEI> elements, and the name <teiCorpus> seems to me like A Good Idea.

c. Without a clearer idea of what <standOff> is intended to achieve, I
cannot comment on why introducing it should obviate the need for
<teiCorpus>.

> Before we move on and deprecate <teiCorpus> we ask you to share your
> opinions on our suggestion.

I would echo Michael on this: please don't.

> Please let us know if you see any substantial problem with removing
> <teiCorpus> in the long run and using nested <TEI> elements instead.
> This might allow something like:
[snip]

The most obvious problem is that this does not name the corpus container
accurately. One of the big advantages of TEI has always been that it
allows encoders to call things what they are. Breaking that model needs
extraordinary evidence of the good it will do. P5 broke some things that
(IMNSHO) didn't need no breakin' and I am unconvinced that this pattern
should continue.

Peter
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Paul Schaffner
In reply to this post by Syd Bauman-10
Or in Windows cmd-speak (since at least ActivePerl for Windows, and maybe
others, won't allow -i without a specified extension, won't accept single
quotes, and won't expand file lists via wildcard without a batch wrapper to
do it)...:

FOR %%a IN (*.xml) DO perl -i.bak -p -e "<(/?)teiCorpus,<$1TEI,g;" %%a

That's probably wrong, and please don't ask me to say it in PowerShell.

pfs

On Fri, Jun 28, 2019, at 14:22, Syd Bauman wrote:

> CORRECTION:
>
> >   $ perl -p -e -i 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
>
> Should, of course, read
>
>     $ perl -p -i -e 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
>
> (I had inadvertently swapped the -i and -e switches ... order matters
> here because the statement "s,abc,xyz,g;" is the argument to the -e
> switch.)
>

--
Paul Schaffner  Digital Content & Collections
University of Michigan Libraries
[hidden email] | http://www.umich.edu/~pfs/
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

James Cummings-5

Hi all, 

Ignoring for the time being ways to cope with a potential change from using <teiCorpus> vs nested <TEI> elements, I thought it might help to have someone who has only been partly involved in the discussions comment to reiterate the underlying issues and what advice from the community is being sought. (I've not been part of the sub-group looking at the standoff proposal, and only investigated these issues when they've been brought to the technical council's discussions.)

There are generally two issues that are being discussed:

1) There is a proposal for a container for <standoff> annotation (generalised to be of all sorts, not perhaps just those envisioned by its creators), which requested its own <teiHeader>. My recollection is that the technical council decided allowing the <teiHeader> in this context was not the best idea, preferring to restrict it to inside <TEI> and <teiCorpus> elements. However, in doing so decided if <TEI> elements were able to nest then the <standoff> element would be tightly-bound enough to its sibling <teiHeader> to answer their requirements. Whether that is suitable or not as a solution is part of the discussion of the standoff proposal. 

2) In coming up with the proposed solution of having <TEI> claim membership of model.resourceLike (and thus be able to sit beside <text>s and <facsimile>s) it means that <TEI> elements would be able to nest. Although I believe this was already agreed, I'm sure the technical council is willing to re-open discussion on it if there is an outpouring of angst saying this should not ever be allowed. (And remember, it would be easy for you to _remove_ this ability from any of your TEI schemas.)

3) What I believe the original email was truly about, and asking for comment on, is whether if and when <TEI> elements are able to nest, there is still a need for a <teiCorpus> element. The idea being here that the TEI is often criticised for having multiple ways of doing things and this would be a way to tidy up such a redundancy at the point it was being created. (Usually the multiple ways of doing things are precisely because multiple ways are needed. (And remember, if you are using an older schema of the TEI for your project there is nothing necessarily forcing you to upgrade to one without <teiCorpus>. Or if using nested <TEI> elements it is easy in your processing to know this is the outermost.)

The answers to number 3  then seem to fall into a few general categories:

a) Reject the proposal for having <TEI> elements able to nest, burn it with fire

b) Accept having nested <TEI> elements and in doing so accept that having both nesting <TEI> elements and an outer <teiCorpus> element is redundant, deprecate <teiCorpus>, though take a long time in doing so since it is such a major change

c) Accept having nested <TEI> elements but conservatively keep the existing <teiCorpus> but de-exemplify it (i.e. gently recommend nesting <TEI> might be better for so, me things and change most examples to follow this practice, and include prose about when you would use it or not)

d) Accept having nested <TEI> elements but conservatively keep the existing <teiCorpus> but do not de-exemplify it (and include prose about when you would use it or not)

There are some people who have already voiced preferences for some of these, but I hope that helps clarify it (at least as I see it, I'm happy to be corrected).

Many thanks,

James 


--

Dr James Cummings, [hidden email]
Senior Lecturer in Late-Medieval Literature and Digital Humanities

School of English, Newcastle University


From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Paul Schaffner <[hidden email]>
Sent: 28 June 2019 20:19
To: [hidden email]
Subject: Re: Nesting TEI and deprecation of teiCorpus
 
Or in Windows cmd-speak (since at least ActivePerl for Windows, and maybe
others, won't allow -i without a specified extension, won't accept single
quotes, and won't expand file lists via wildcard without a batch wrapper to
do it)...:

FOR %%a IN (*.xml) DO perl -i.bak -p -e "<(/?)teiCorpus,<$1TEI,g;" %%a

That's probably wrong, and please don't ask me to say it in PowerShell.

pfs

On Fri, Jun 28, 2019, at 14:22, Syd Bauman wrote:
> CORRECTION:
>
> >   $ perl -p -e -i 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
>
> Should, of course, read
>
>     $ perl -p -i -e 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
>
> (I had inadvertently swapped the -i and -e switches ... order matters
> here because the statement "s,abc,xyz,g;" is the argument to the -e
> switch.)
>

--
Paul Schaffner  Digital Content & Collections
University of Michigan Libraries
[hidden email] | http://www.umich.edu/~pfs/
lou
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

lou
Thank you James for trying to explain this a bit more fully. I think I get the drift, but now I am even more against the idea of deprecating teiCorpus. Necessarily, because I don't like the idea of allowing TEI to self-nest one little bit. What is the justification for that? You say there's a desire to allow for close coupling between a teiHeader and a <standOff> (or whatever the thing is called), but there are surely other ways of achieving that without  wrecking the entire TEI architecture!  You could have a special standOffMeta  as child of standOff, you could permit teiHeader within standOff. This seems so obvious I must be missing something ...

Considering a TEI element as a "resource" in the model.resourceLike sense makes no sense to me-- TEI is defined as the combination of a header and one or more resources: why muddy this water?


On Mon, 1 Jul 2019 at 16:31, James Cummings <[hidden email]> wrote:

Hi all, 

Ignoring for the time being ways to cope with a potential change from using <teiCorpus> vs nested <TEI> elements, I thought it might help to have someone who has only been partly involved in the discussions comment to reiterate the underlying issues and what advice from the community is being sought. (I've not been part of the sub-group looking at the standoff proposal, and only investigated these issues when they've been brought to the technical council's discussions.)

There are generally two issues that are being discussed:

1) There is a proposal for a container for <standoff> annotation (generalised to be of all sorts, not perhaps just those envisioned by its creators), which requested its own <teiHeader>. My recollection is that the technical council decided allowing the <teiHeader> in this context was not the best idea, preferring to restrict it to inside <TEI> and <teiCorpus> elements. However, in doing so decided if <TEI> elements were able to nest then the <standoff> element would be tightly-bound enough to its sibling <teiHeader> to answer their requirements. Whether that is suitable or not as a solution is part of the discussion of the standoff proposal. 

2) In coming up with the proposed solution of having <TEI> claim membership of model.resourceLike (and thus be able to sit beside <text>s and <facsimile>s) it means that <TEI> elements would be able to nest. Although I believe this was already agreed, I'm sure the technical council is willing to re-open discussion on it if there is an outpouring of angst saying this should not ever be allowed. (And remember, it would be easy for you to _remove_ this ability from any of your TEI schemas.)

3) What I believe the original email was truly about, and asking for comment on, is whether if and when <TEI> elements are able to nest, there is still a need for a <teiCorpus> element. The idea being here that the TEI is often criticised for having multiple ways of doing things and this would be a way to tidy up such a redundancy at the point it was being created. (Usually the multiple ways of doing things are precisely because multiple ways are needed. (And remember, if you are using an older schema of the TEI for your project there is nothing necessarily forcing you to upgrade to one without <teiCorpus>. Or if using nested <TEI> elements it is easy in your processing to know this is the outermost.)

The answers to number 3  then seem to fall into a few general categories:

a) Reject the proposal for having <TEI> elements able to nest, burn it with fire

b) Accept having nested <TEI> elements and in doing so accept that having both nesting <TEI> elements and an outer <teiCorpus> element is redundant, deprecate <teiCorpus>, though take a long time in doing so since it is such a major change

c) Accept having nested <TEI> elements but conservatively keep the existing <teiCorpus> but de-exemplify it (i.e. gently recommend nesting <TEI> might be better for so, me things and change most examples to follow this practice, and include prose about when you would use it or not)

d) Accept having nested <TEI> elements but conservatively keep the existing <teiCorpus> but do not de-exemplify it (and include prose about when you would use it or not)

There are some people who have already voiced preferences for some of these, but I hope that helps clarify it (at least as I see it, I'm happy to be corrected).

Many thanks,

James 


--

Dr James Cummings, [hidden email]
Senior Lecturer in Late-Medieval Literature and Digital Humanities

School of English, Newcastle University


From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Paul Schaffner <[hidden email]>
Sent: 28 June 2019 20:19
To: [hidden email]
Subject: Re: Nesting TEI and deprecation of teiCorpus
 
Or in Windows cmd-speak (since at least ActivePerl for Windows, and maybe
others, won't allow -i without a specified extension, won't accept single
quotes, and won't expand file lists via wildcard without a batch wrapper to
do it)...:

FOR %%a IN (*.xml) DO perl -i.bak -p -e "<(/?)teiCorpus,<$1TEI,g;" %%a

That's probably wrong, and please don't ask me to say it in PowerShell.

pfs

On Fri, Jun 28, 2019, at 14:22, Syd Bauman wrote:
> CORRECTION:
>
> >   $ perl -p -e -i 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
>
> Should, of course, read
>
>     $ perl -p -i -e 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
>
> (I had inadvertently swapped the -i and -e switches ... order matters
> here because the statement "s,abc,xyz,g;" is the argument to the -e
> switch.)
>

--
Paul Schaffner  Digital Content & Collections
University of Michigan Libraries
[hidden email] | http://www.umich.edu/~pfs/
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Hugh Cayless-2
To try to clarify a bit further: the reason for wanting standoff to be able to have a teiHeader is because there's a requirement to be able to represent several annotation campaigns for a single source, each of which would have its own metadata, including many, if not all, of the things that can go in a teiHeader. So the Council working group was convinced that the use of teiHeader was warranted.

We thought though, that <standoff><teiHeader>...</teiHeader>...</standoff> would be a bit strange, and that nesting TEI instead was less weird. I suppose the question is whether to make standoff TEI-like (by giving it a teiHeader), or making TEI resource-like, by allowing it to nest directly without a teiCorpus wrapper.

Just to elaborate on standoff a little more: we are envisioning this as a container both for annotations (the precise structure of which is still under discussion) and as a place to put structured data that might be linked to the text, e.g. lists of persons, places, and so on. One reason for preferring the <TEI><teiHeader>...</teiHeader><standoff>...</standoff></TEI> structure is that you could use that to publish (for example) a structured, standalone prosopography as a TEI document, and the structure of that prosopography would be the same as if it were embedded in a TEI document with a text, so it's consistent. It solves a problem of deciding what metadata is associated with <standoff>, because you always have only one place to look for that metadata. 

Personally, I'm neutral about the question of whether to retain teiCorpus. I don't use it, but there's no harm in keeping it, even if it becomes a bit redundant. We thought we should start by asking the community what you think :-).

All the best,
Hugh

On Mon, Jul 1, 2019 at 12:42 PM Lou Burnard <[hidden email]> wrote:
Thank you James for trying to explain this a bit more fully. I think I get the drift, but now I am even more against the idea of deprecating teiCorpus. Necessarily, because I don't like the idea of allowing TEI to self-nest one little bit. What is the justification for that? You say there's a desire to allow for close coupling between a teiHeader and a <standOff> (or whatever the thing is called), but there are surely other ways of achieving that without  wrecking the entire TEI architecture!  You could have a special standOffMeta  as child of standOff, you could permit teiHeader within standOff. This seems so obvious I must be missing something ...

Considering a TEI element as a "resource" in the model.resourceLike sense makes no sense to me-- TEI is defined as the combination of a header and one or more resources: why muddy this water?


On Mon, 1 Jul 2019 at 16:31, James Cummings <[hidden email]> wrote:

Hi all, 

Ignoring for the time being ways to cope with a potential change from using <teiCorpus> vs nested <TEI> elements, I thought it might help to have someone who has only been partly involved in the discussions comment to reiterate the underlying issues and what advice from the community is being sought. (I've not been part of the sub-group looking at the standoff proposal, and only investigated these issues when they've been brought to the technical council's discussions.)

There are generally two issues that are being discussed:

1) There is a proposal for a container for <standoff> annotation (generalised to be of all sorts, not perhaps just those envisioned by its creators), which requested its own <teiHeader>. My recollection is that the technical council decided allowing the <teiHeader> in this context was not the best idea, preferring to restrict it to inside <TEI> and <teiCorpus> elements. However, in doing so decided if <TEI> elements were able to nest then the <standoff> element would be tightly-bound enough to its sibling <teiHeader> to answer their requirements. Whether that is suitable or not as a solution is part of the discussion of the standoff proposal. 

2) In coming up with the proposed solution of having <TEI> claim membership of model.resourceLike (and thus be able to sit beside <text>s and <facsimile>s) it means that <TEI> elements would be able to nest. Although I believe this was already agreed, I'm sure the technical council is willing to re-open discussion on it if there is an outpouring of angst saying this should not ever be allowed. (And remember, it would be easy for you to _remove_ this ability from any of your TEI schemas.)

3) What I believe the original email was truly about, and asking for comment on, is whether if and when <TEI> elements are able to nest, there is still a need for a <teiCorpus> element. The idea being here that the TEI is often criticised for having multiple ways of doing things and this would be a way to tidy up such a redundancy at the point it was being created. (Usually the multiple ways of doing things are precisely because multiple ways are needed. (And remember, if you are using an older schema of the TEI for your project there is nothing necessarily forcing you to upgrade to one without <teiCorpus>. Or if using nested <TEI> elements it is easy in your processing to know this is the outermost.)

The answers to number 3  then seem to fall into a few general categories:

a) Reject the proposal for having <TEI> elements able to nest, burn it with fire

b) Accept having nested <TEI> elements and in doing so accept that having both nesting <TEI> elements and an outer <teiCorpus> element is redundant, deprecate <teiCorpus>, though take a long time in doing so since it is such a major change

c) Accept having nested <TEI> elements but conservatively keep the existing <teiCorpus> but de-exemplify it (i.e. gently recommend nesting <TEI> might be better for so, me things and change most examples to follow this practice, and include prose about when you would use it or not)

d) Accept having nested <TEI> elements but conservatively keep the existing <teiCorpus> but do not de-exemplify it (and include prose about when you would use it or not)

There are some people who have already voiced preferences for some of these, but I hope that helps clarify it (at least as I see it, I'm happy to be corrected).

Many thanks,

James 


--

Dr James Cummings, [hidden email]
Senior Lecturer in Late-Medieval Literature and Digital Humanities

School of English, Newcastle University


From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Paul Schaffner <[hidden email]>
Sent: 28 June 2019 20:19
To: [hidden email]
Subject: Re: Nesting TEI and deprecation of teiCorpus
 
Or in Windows cmd-speak (since at least ActivePerl for Windows, and maybe
others, won't allow -i without a specified extension, won't accept single
quotes, and won't expand file lists via wildcard without a batch wrapper to
do it)...:

FOR %%a IN (*.xml) DO perl -i.bak -p -e "<(/?)teiCorpus,<$1TEI,g;" %%a

That's probably wrong, and please don't ask me to say it in PowerShell.

pfs

On Fri, Jun 28, 2019, at 14:22, Syd Bauman wrote:
> CORRECTION:
>
> >   $ perl -p -e -i 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
>
> Should, of course, read
>
>     $ perl -p -i -e 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
>
> (I had inadvertently swapped the -i and -e switches ... order matters
> here because the statement "s,abc,xyz,g;" is the argument to the -e
> switch.)
>

--
Paul Schaffner  Digital Content & Collections
University of Michigan Libraries
[hidden email] | http://www.umich.edu/~pfs/
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Martin Holmes
In reply to this post by lou
Hi Lou,

I have some sympathy with the nesting of TEI elements because I'm
finding that it's harder and harder to specify or understand what a
document is. In larger projects, what constitutes an individual file is
not necessarily mapped in any way to what we would traditionally think
of as a document; with all the includes, pointers, and other references
that spider-web out across a large collection of "documents", a single
file is no longer a coherent object with clear boundaries. The ability
to nest TEI elements either in one's original encoding or in packages of
XML generated from it would be helpful in many ways.

That said, I see no immediate need to deprecate teiCorpus myself; for
projects where there is a clear and obvious boundary between "documents"
and "collections of documents", it's intuitive and straightforward. If
we could have both TEI nesting and teiCorpus, the latter would be a sort
of syntactic sugar for a TEI element with TEI children.

Cheers,
Martin

On 2019-07-01 9:42 a.m., Lou Burnard wrote:

> Thank you James for trying to explain this a bit more fully. I think I
> get the drift, but now I am even more against the idea of deprecating
> teiCorpus. Necessarily, because I don't like the idea of allowing TEI to
> self-nest one little bit. What is the justification for that? You say
> there's a desire to allow for close coupling between a teiHeader and a
> <standOff> (or whatever the thing is called), but there are surely other
> ways of achieving that without  wrecking the entire TEI architecture!  
> You could have a special standOffMeta  as child of standOff, you could
> permit teiHeader within standOff. This seems so obvious I must be
> missing something ...
>
> Considering a TEI element as a "resource" in the model.resourceLike
> sense makes no sense to me-- TEI is defined as the combination of a
> header and one or more resources: why muddy this water?
>
>
> On Mon, 1 Jul 2019 at 16:31, James Cummings
> <[hidden email] <mailto:[hidden email]>>
> wrote:
>
>
>     Hi all,
>
>     Ignoring for the time being ways to cope with a potential change
>     from using <teiCorpus> vs nested <TEI> elements, I thought it might
>     help to have someone who has only been partly involved in the
>     discussions comment to reiterate the underlying issues and what
>     advice from the community is being sought. (I've not been part of
>     the sub-group looking at the standoff proposal, and only
>     investigated these issues when they've been brought to the technical
>     council's discussions.)
>
>     There are generally two issues that are being discussed:
>
>     1) There is a proposal for a container for <standoff> annotation
>     (generalised to be of all sorts, not perhaps just those envisioned
>     by its creators), which requested its own <teiHeader>. My
>     recollection is that the technical council decided allowing the
>     <teiHeader> in this context was not the best idea, preferring to
>     restrict it to inside <TEI> and <teiCorpus> elements. However, in
>     doing so decided if <TEI> elements were able to nest then the
>     <standoff> element would be tightly-bound enough to its sibling
>     <teiHeader> to answer their requirements. Whether that is suitable
>     or not as a solution is part of the discussion of the standoff
>     proposal.
>
>     2) In coming up with the proposed solution of having <TEI> claim
>     membership of model.resourceLike (and thus be able to sit beside
>     <text>s and <facsimile>s) it means that <TEI> elements would be able
>     to nest. Although I believe this was already agreed, I'm sure the
>     technical council is willing to re-open discussion on it if there is
>     an outpouring of angst saying this should not ever be allowed. (And
>     remember, it would be easy for you to _remove_ this ability from any
>     of your TEI schemas.)
>
>     3) What I believe the original email was truly about, and asking for
>     comment on, is whether if and when <TEI> elements are able to nest,
>     there is still a need for a <teiCorpus> element. The idea being here
>     that the TEI is often criticised for having multiple ways of doing
>     things and this would be a way to tidy up such a redundancy at the
>     point it was being created. (Usually the multiple ways of doing
>     things are precisely because multiple ways are needed. (And
>     remember, if you are using an older schema of the TEI for your
>     project there is nothing necessarily forcing you to upgrade to one
>     without <teiCorpus>. Or if using nested <TEI> elements it is easy in
>     your processing to know this is the outermost.)
>
>     The answers to number 3  then seem to fall into a few general
>     categories:
>
>     a) Reject the proposal for having <TEI> elements able to nest, burn
>     it with fire
>
>     b) Accept having nested <TEI> elements and in doing so accept that
>     having both nesting <TEI> elements and an outer <teiCorpus> element
>     is redundant, deprecate <teiCorpus>, though take a long time in
>     doing so since it is such a major change
>
>     c) Accept having nested <TEI> elements but conservatively keep the
>     existing <teiCorpus> but de-exemplify it (i.e. gently recommend
>     nesting <TEI> might be better for so, me things and change most
>     examples to follow this practice, and include prose about when you
>     would use it or not)
>
>     d) Accept having nested <TEI> elements but conservatively keep the
>     existing <teiCorpus> but do not de-exemplify it (and include prose
>     about when you would use it or not)
>
>     There are some people who have already voiced preferences for some
>     of these, but I hope that helps clarify it (at least as I see it,
>     I'm happy to be corrected).
>
>     Many thanks,
>
>     James
>
>
>     --
>
>     Dr James Cummings, [hidden email]
>     <mailto:[hidden email]>
>     Senior Lecturer in Late-Medieval Literature and Digital Humanities
>
>     School of English, Newcastle University
>
>     ------------------------------------------------------------------------
>     *From:* TEI (Text Encoding Initiative) public discussion list
>     <[hidden email] <mailto:[hidden email]>> on
>     behalf of Paul Schaffner <[hidden email]
>     <mailto:[hidden email]>>
>     *Sent:* 28 June 2019 20:19
>     *To:* [hidden email] <mailto:[hidden email]>
>     *Subject:* Re: Nesting TEI and deprecation of teiCorpus
>     Or in Windows cmd-speak (since at least ActivePerl for Windows, and
>     maybe
>     others, won't allow -i without a specified extension, won't accept
>     single
>     quotes, and won't expand file lists via wildcard without a batch
>     wrapper to
>     do it)...:
>
>     FOR %%a IN (*.xml) DO perl -i.bak -p -e "<(/?)teiCorpus,<$1TEI,g;" %%a
>
>     That's probably wrong, and please don't ask me to say it in PowerShell.
>
>     pfs
>
>     On Fri, Jun 28, 2019, at 14:22, Syd Bauman wrote:
>     > CORRECTION:
>     >
>     > >   $ perl -p -e -i 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
>     >
>     > Should, of course, read
>     >
>     >     $ perl -p -i -e 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
>     >
>     > (I had inadvertently swapped the -i and -e switches ... order matters
>     > here because the statement "s,abc,xyz,g;" is the argument to the -e
>     > switch.)
>     >
>
>     --
>     Paul Schaffner  Digital Content & Collections
>     University of Michigan Libraries
>     [hidden email] <mailto:[hidden email]> |
>     http://www.umich.edu/~pfs/
>
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Stuart A. Yeates
For me, the teiCorpus and TEI is a bit like <div1>, <div2>, <div3>, ...

I understand that they represent distinctions which are very clear to
some people and map naturally to concepts long used in some academic
disciplines. I also understand that these differentiations make
abstraction (and thus many programming tasks) significantly harder.

cheers
stuart
--
...let us be heard from red core to black sky

On Tue, 2 Jul 2019 at 05:59, Martin Holmes <[hidden email]> wrote:

>
> Hi Lou,
>
> I have some sympathy with the nesting of TEI elements because I'm
> finding that it's harder and harder to specify or understand what a
> document is. In larger projects, what constitutes an individual file is
> not necessarily mapped in any way to what we would traditionally think
> of as a document; with all the includes, pointers, and other references
> that spider-web out across a large collection of "documents", a single
> file is no longer a coherent object with clear boundaries. The ability
> to nest TEI elements either in one's original encoding or in packages of
> XML generated from it would be helpful in many ways.
>
> That said, I see no immediate need to deprecate teiCorpus myself; for
> projects where there is a clear and obvious boundary between "documents"
> and "collections of documents", it's intuitive and straightforward. If
> we could have both TEI nesting and teiCorpus, the latter would be a sort
> of syntactic sugar for a TEI element with TEI children.
>
> Cheers,
> Martin
>
> On 2019-07-01 9:42 a.m., Lou Burnard wrote:
> > Thank you James for trying to explain this a bit more fully. I think I
> > get the drift, but now I am even more against the idea of deprecating
> > teiCorpus. Necessarily, because I don't like the idea of allowing TEI to
> > self-nest one little bit. What is the justification for that? You say
> > there's a desire to allow for close coupling between a teiHeader and a
> > <standOff> (or whatever the thing is called), but there are surely other
> > ways of achieving that without  wrecking the entire TEI architecture!
> > You could have a special standOffMeta  as child of standOff, you could
> > permit teiHeader within standOff. This seems so obvious I must be
> > missing something ...
> >
> > Considering a TEI element as a "resource" in the model.resourceLike
> > sense makes no sense to me-- TEI is defined as the combination of a
> > header and one or more resources: why muddy this water?
> >
> >
> > On Mon, 1 Jul 2019 at 16:31, James Cummings
> > <[hidden email] <mailto:[hidden email]>>
> > wrote:
> >
> >
> >     Hi all,
> >
> >     Ignoring for the time being ways to cope with a potential change
> >     from using <teiCorpus> vs nested <TEI> elements, I thought it might
> >     help to have someone who has only been partly involved in the
> >     discussions comment to reiterate the underlying issues and what
> >     advice from the community is being sought. (I've not been part of
> >     the sub-group looking at the standoff proposal, and only
> >     investigated these issues when they've been brought to the technical
> >     council's discussions.)
> >
> >     There are generally two issues that are being discussed:
> >
> >     1) There is a proposal for a container for <standoff> annotation
> >     (generalised to be of all sorts, not perhaps just those envisioned
> >     by its creators), which requested its own <teiHeader>. My
> >     recollection is that the technical council decided allowing the
> >     <teiHeader> in this context was not the best idea, preferring to
> >     restrict it to inside <TEI> and <teiCorpus> elements. However, in
> >     doing so decided if <TEI> elements were able to nest then the
> >     <standoff> element would be tightly-bound enough to its sibling
> >     <teiHeader> to answer their requirements. Whether that is suitable
> >     or not as a solution is part of the discussion of the standoff
> >     proposal.
> >
> >     2) In coming up with the proposed solution of having <TEI> claim
> >     membership of model.resourceLike (and thus be able to sit beside
> >     <text>s and <facsimile>s) it means that <TEI> elements would be able
> >     to nest. Although I believe this was already agreed, I'm sure the
> >     technical council is willing to re-open discussion on it if there is
> >     an outpouring of angst saying this should not ever be allowed. (And
> >     remember, it would be easy for you to _remove_ this ability from any
> >     of your TEI schemas.)
> >
> >     3) What I believe the original email was truly about, and asking for
> >     comment on, is whether if and when <TEI> elements are able to nest,
> >     there is still a need for a <teiCorpus> element. The idea being here
> >     that the TEI is often criticised for having multiple ways of doing
> >     things and this would be a way to tidy up such a redundancy at the
> >     point it was being created. (Usually the multiple ways of doing
> >     things are precisely because multiple ways are needed. (And
> >     remember, if you are using an older schema of the TEI for your
> >     project there is nothing necessarily forcing you to upgrade to one
> >     without <teiCorpus>. Or if using nested <TEI> elements it is easy in
> >     your processing to know this is the outermost.)
> >
> >     The answers to number 3  then seem to fall into a few general
> >     categories:
> >
> >     a) Reject the proposal for having <TEI> elements able to nest, burn
> >     it with fire
> >
> >     b) Accept having nested <TEI> elements and in doing so accept that
> >     having both nesting <TEI> elements and an outer <teiCorpus> element
> >     is redundant, deprecate <teiCorpus>, though take a long time in
> >     doing so since it is such a major change
> >
> >     c) Accept having nested <TEI> elements but conservatively keep the
> >     existing <teiCorpus> but de-exemplify it (i.e. gently recommend
> >     nesting <TEI> might be better for so, me things and change most
> >     examples to follow this practice, and include prose about when you
> >     would use it or not)
> >
> >     d) Accept having nested <TEI> elements but conservatively keep the
> >     existing <teiCorpus> but do not de-exemplify it (and include prose
> >     about when you would use it or not)
> >
> >     There are some people who have already voiced preferences for some
> >     of these, but I hope that helps clarify it (at least as I see it,
> >     I'm happy to be corrected).
> >
> >     Many thanks,
> >
> >     James
> >
> >
> >     --
> >
> >     Dr James Cummings, [hidden email]
> >     <mailto:[hidden email]>
> >     Senior Lecturer in Late-Medieval Literature and Digital Humanities
> >
> >     School of English, Newcastle University
> >
> >     ------------------------------------------------------------------------
> >     *From:* TEI (Text Encoding Initiative) public discussion list
> >     <[hidden email] <mailto:[hidden email]>> on
> >     behalf of Paul Schaffner <[hidden email]
> >     <mailto:[hidden email]>>
> >     *Sent:* 28 June 2019 20:19
> >     *To:* [hidden email] <mailto:[hidden email]>
> >     *Subject:* Re: Nesting TEI and deprecation of teiCorpus
> >     Or in Windows cmd-speak (since at least ActivePerl for Windows, and
> >     maybe
> >     others, won't allow -i without a specified extension, won't accept
> >     single
> >     quotes, and won't expand file lists via wildcard without a batch
> >     wrapper to
> >     do it)...:
> >
> >     FOR %%a IN (*.xml) DO perl -i.bak -p -e "<(/?)teiCorpus,<$1TEI,g;" %%a
> >
> >     That's probably wrong, and please don't ask me to say it in PowerShell.
> >
> >     pfs
> >
> >     On Fri, Jun 28, 2019, at 14:22, Syd Bauman wrote:
> >     > CORRECTION:
> >     >
> >     > >   $ perl -p -e -i 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
> >     >
> >     > Should, of course, read
> >     >
> >     >     $ perl -p -i -e 's,(</?)teiCorpus,$1TEI,g;' [INPUT_FILES]
> >     >
> >     > (I had inadvertently swapped the -i and -e switches ... order matters
> >     > here because the statement "s,abc,xyz,g;" is the argument to the -e
> >     > switch.)
> >     >
> >
> >     --
> >     Paul Schaffner  Digital Content & Collections
> >     University of Michigan Libraries
> >     [hidden email] <mailto:[hidden email]> |
> >     http://www.umich.edu/~pfs/
> >
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Peter Flynn-8
In reply to this post by James Cummings-5
On 01/07/2019 16:31, James Cummings wrote:
> [...]
> However, in doing so decided if <TEI> elements
> were able to nest

I think that's the non sequitur that's tripping me up.
In what way does the introduction of <standOff> imply that <TEI>
elements can/should/will be nested?

> 2) In coming up with the proposed solution of having <TEI> claim
> membership of model.resourceLike (and thus be able to sit beside <text>s
> and <facsimile>s)

I think what I don't grok is why one would want to do such a thing.
I;m sure there are good reasons, but they are quite exceptionally
non-obvious.

> it means that <TEI> elements would be able to nest.
> Although I believe this was already agreed, I'm sure the technical
> council is willing to re-open discussion on it if there is an outpouring
> of angst saying this should not ever be allowed.

Not so much angst, more a vox de profundis saying "Beware".

> (And remember, it would
> be easy for you to _remove_ this ability from any of your TEI schemas.)

Certainly possible, for some value of 'easy'

> 3) What I believe the original email was truly about, and asking for
> comment on, is whether if and when <TEI> elements are able to nest,
> there is still a need for a <teiCorpus> element.

Only for the purposes of identity. Names are strange things — people
assume they mean what they say. Tedious of them :-)

> The idea being here that the TEI is often criticised for having
> multiple ways of doing things
I've never regarded that as anything except an advantage.

> and this would be a way to tidy up such a redundancy at the point it
> was being created.
Actually it just transfers duplication to <TEI> which is no longer 'a
document' but possibly also 'a collection of documents'.

> There are some people who have already voiced preferences for some of
> these, but I hope that helps clarify it (at least as I see it, I'm happy
> to be corrected).

Thank you for the excellent clarifications. I'll continue to sit on the
fence — I'm retired and just helping out some sites, and I need to know
how to advise them.

Peter
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

James Cummings-5
Hi Peter
(apologies for top posting, the web client I'm using won't do sensible quoting) 

Just a note to clarify that I'm not arguing in favour of any of these ways forward, indeed I'm probably one of the more stick-in-the-mud members of council, and was only trying to clarify the logical possibilities. I've not been directly involved with the development of the standoff proposal so others can comment on why nesting TEI solves (or does not) this problem. I believe there are those who wish to do this particular form of standoff (which is more specialised than any of my needs for it) in a more hierarchical way, and this also needs tightly bound metadata for which council did not want to include a teiHeader. Hugh gave a better example than me I believe.

With removing the ability of TEI to self-nest, this would be an ODD customisation removing it from the model.resourceLike class (assuming it was implemented as I am understanding it). I'd expect this to be a common customisation, so 'easy' in the sense of copying and pasting an example and regenerating a schema from the new Roma. 

Yes, names are important and if nesting TEI elements went ahead then I'd lean towards keeping teiCorpus personally, even if it would be basically the same as <TEI type="corpus"> or something.
  

Many thanks,

James 


--

Dr James Cummings, [hidden email]
Senior Lecturer in Late-Medieval Literature and Digital Humanities

School of English, Newcastle University



From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Peter Flynn <[hidden email]>
Sent: 01 July 2019 22:58
To: [hidden email]
Subject: Re: Nesting TEI and deprecation of teiCorpus
 
On 01/07/2019 16:31, James Cummings wrote:
> [...]
> However, in doing so decided if <TEI> elements
> were able to nest

I think that's the non sequitur that's tripping me up.
In what way does the introduction of <standOff> imply that <TEI>
elements can/should/will be nested?

> 2) In coming up with the proposed solution of having <TEI> claim
> membership of model.resourceLike (and thus be able to sit beside <text>s
> and <facsimile>s)

I think what I don't grok is why one would want to do such a thing.
I;m sure there are good reasons, but they are quite exceptionally
non-obvious.

> it means that <TEI> elements would be able to nest.
> Although I believe this was already agreed, I'm sure the technical
> council is willing to re-open discussion on it if there is an outpouring
> of angst saying this should not ever be allowed.

Not so much angst, more a vox de profundis saying "Beware".

> (And remember, it would
> be easy for you to _remove_ this ability from any of your TEI schemas.)

Certainly possible, for some value of 'easy'

> 3) What I believe the original email was truly about, and asking for
> comment on, is whether if and when <TEI> elements are able to nest,
> there is still a need for a <teiCorpus> element.

Only for the purposes of identity. Names are strange things — people
assume they mean what they say. Tedious of them :-)

> The idea being here that the TEI is often criticised for having
> multiple ways of doing things
I've never regarded that as anything except an advantage.

> and this would be a way to tidy up such a redundancy at the point it
> was being created.
Actually it just transfers duplication to <TEI> which is no longer 'a
document' but possibly also 'a collection of documents'.

> There are some people who have already voiced preferences for some of
> these, but I hope that helps clarify it (at least as I see it, I'm happy
> to be corrected).

Thank you for the excellent clarifications. I'll continue to sit on the
fence — I'm retired and just helping out some sites, and I need to know
how to advise them.

Peter
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Syd Bauman-10
In reply to this post by Martin Holmes
I agree with Martin that there is quite a bit of utility in nesting
<TEI>. But furthermore, I think there is a certain elegance to the
plan, as it simplifies the TEI model writ large:

   A TEI document is a <teiHeader> followed by one or more resource
   things; if you want one of those resource things to have some of
   its own metadata, wrap it in a <TEI> and insert a <teiHeader> in
   front of it.

Further, the change of allowing nesting <TEI> does not change the
expressive power of the system at all. Note, e.g., that currently you
can express a hierarchically nesting "document" structure using
nested <teiCorpus>, e.g.:

  <teiCorpus>
    <teiHeader/>
    <text/>
    <teiCorpus>
      <teiHeader/>
      <TEI>
        <teiHeader/>
        <text/>
      </TEI>
      <TEI>
        <teiHeader/>
        <text/>
      </TEI>
    </teiCorpus>
    <TEI>
      <teiHeader/>
      <text/>
    </TEI>
    <teiCorpus>
      <teiHeader/>
      <TEI>
        <teiHeader/>
        <text/>
      </TEI>
      <TEI>
        <teiHeader/>
        <text/>
      </TEI>
    </teiCorpus>
  </teiCorpus>

With nested <TEI> elements, the exact same structure can be expressed:

  <TEI>
    <teiHeader/>
    <text/>
    <TEI>
      <teiHeader/>
      <TEI>
        <teiHeader/>
        <text/>
      </TEI>
      <TEI>
        <teiHeader/>
        <text/>
      </TEI>
    </TEI>
    <TEI>
      <teiHeader/>
      <text/>
    </TEI>
    <TEI>
      <teiHeader/>
      <TEI>
        <teiHeader/>
        <text/>
      </TEI>
      <TEI>
        <teiHeader/>
        <text/>
      </TEI>
    </TEI>
  </TEI>

with no loss in expressiveness at all. And this does solve the issue
that the <soCalled>stand-offers</> found important: to explicitly and
tightly associate metadata unique to a set of stand-off annotations
with them. E.g.:

  <TEI>
    <teiHeader/>
    <text/>
    <TEI>
      <teiHeader/>
      <standOff>
    </TEI>
    <TEI>
      <teiHeader/>
      <standOff/>
    </TEI>
    <TEI>
      <teiHeader/>
      <standOff/>
    </TEI>
  </TEI>


But that is all about the issue of <TEI> self-nesting. The main
controversial issue (in my mind, at least) is whether or not to keep
<teiCorpus> around once <TEI> can self-nest.

There is some truth to Piotr's accusation that the removal of
<teiCorpus> would be more for Council than for users. (Not really
just Council, of course, but anyone who wants or needs to read,
understand, edit, or play with the content model of <teiCorpus>; or
those who have to process generic TEI documents.)

The content model for <teiCorpus> in the "<TEI> self-nesting but
<teiCorpus> also available" universe would be something like

 element teiCorpus {
           (
             teiHeader,
             (
               ( model.resourceLike+, ( TEI | teiCorpus )* )
               |
               ( TEI | teiCorpus )+
             )
           )
         }

But in the "<TEI> self-nesting" universe, whether <teiCorpus> and its
somewhat complex content model is kept or not, the content model for <TEI>
is simplicity itself:

 element TEI { teiHeader, model.resourceLike+ }

And I do see a mild advantage for users and for teachers of TEI in
this clean simplicity: A <TEI> element is one or more resources
explicitly associated with its (their) metadata.

I also see a small advantage for those of us who write software that
is supposed to handle a generic TEI document, rather than one from a
specific project. As it stands, such software has to be prepared to
handle either <TEI> or <teiCorpus> as the outermost element. If
<teiCorpus> did not exist, such programs would become one small step
easier to deal with.

However, the argument against this practice (using <TEI> instead of
<teiCorpus>) that Piotr foreshadows and Michael explicates is quite
valid: if TEI were to drop <teiCorpus>, closed TEI schemas created
with ODD will not be able to differentiate <TEI> elements that
represent a corpus from those that represent a single document.[1]

Open schemas (read "Schematron", including the Schematron embedded in
an ODD), XSLT programs, XQuery programs, TEI processing models, and
TEI expressions of default rendition will be able to quite easily
differentiate <TEI> elements that represent a corpus from those that
represent a single document. But, as Michael points out, it is the
closed schema that we often use to help us create valid documents in
the first place. (Note, too, that even in the case that <teiCorpus>
exists either as it does now or alongside nesting <TEI>, a closed TEI
schema created with ODD cannot differentiate a <teiCorpus> that is
the outermost <teiCorpus> from one that is 3 levels deep.[1])

So I think there are definitely pluses and minuses here. Personally,
I think the pluses outweigh the minuses. But I don't write TEI
documents that use <teiCorpus> by hand; the ex-post validation of an
open schema is just as good to me as the ex-ante validation of a
closed one. (And in addition, I could use Schematron Quick Fixes with
the open schema.) Those who do, or would, take advantage of the
ability to constrain <teiCorpus> differently than <TEI> in the closed
schema would be adversely affected by removing <teiCorpus>. How many
such folks are there, I wonder.

Note
----
 [1] These statements are not completely true. It is probably
     *possible* to write an ODD (although not a PureODD) from which a
     RELAX NG schema that can differentiate these things can be
     generated. But it involves serious hacking with direct RELAX NG
     code squirreled away in a manner that leaves your ODD somewhat
     fragile, and does not afford the complete spectrum of
     documentation capability that ODD offers. So I don't recommend
     it.

--
 Syd Bauman, NRP  (he/him/his)
 Senior XML Programmer/Analyst
 Northeastern University Women Writers Project
 [hidden email] or
 [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Nesting TEI and deprecation of teiCorpus

Piotr Bański
Hi Syd,

Thanks for the attempt at elucidating the issues at hand. While I would
prefer not to concentrate on the deprecation of teiCorpus and focus on
how inserting an intervening TEI element buries the idea behind
standOff, it is difficult not to follow the pattern that you have traced.

So I will attempt to stay on topic here and mention three points that
come to my mind with regard to the deprecation of teiCorpus, and if the
heat allows, will move onto standOff in a forked thread afterwards.

1. Firstly, I am as always a fan of your rhetoric :-) , but I need to
point out that the comparison you draw on the basis of the two XML
hierarchies below is strictly rhetorical and not very useful otherwise.

There is a certain semantics attached to teiCorpus, and, roughly, the
implication of using teiCorpus at the root is that you will not go ahead
to mix <text> and <teiCorpus> freely, as you do at the top of your
example. (Of course you can, but we know that we can do a lot within the
TEI but choose not to, and often use Schematron to guard and highlight
those choices.) The implication of using a teiCorpus at the top is to
use either a sequence of TEI elements below (after the main corpus
header), or a sequence of teiCorpus elements for subcorpora. teiCorpus
at the top is a pretty strong signal of what to expect below.

2. I think that the argument from greater ease of software creation that
you give next could be pushed ad absurdum:

 > I also see a small advantage for those of us who write software that
 > is supposed to handle a generic TEI document, rather than one from a
 > specific project. As it stands, such software has to be prepared to
 > handle either <TEI> or <teiCorpus> as the outermost element. If
 > <teiCorpus> did not exist, such programs would become one small step
 > easier to deal with.

I am sure that processing 100 different elements is easier than
processing 500 different elements, but that should not be a guideline
for thinning the TEI vocabulary. Rather, care should be given to prevent
structural ambiguity, and elimination of teiCorpus might (or rather:
will) lead to an increase of ambiguity. Instead of a disjunction to
handle teiCorpus vs. TEI, you will employ checks for what you are
dealing with (a leaf TEI or an umbrella TEI), at each level down the
hierarchy. Of course you can -- I'm just saying that I don't find an
argument from writing software convincing for the issue at hand.

3. Being able to root an instance in more than one element has never
seemed a bad thing to me, especially if the choice of the root were not
random. In fact, for years I've been wondering why the TEI doesn't allow
for a third potential root element, namely teiHeader -- to cater to all
those folks who want to use the TEI for the rich header structure, and
may be annoyed when forced to look at a withered text/ab dangling next
to the header, just because the schema requires that.

All that said, I admit that I am somewhat lukewarm on this issue except
that I don't see anything broke in the current state of things, and so
consequently I don't see why it should get "fixed". I'd rather see the
steam redirected elsewhere.

Best wishes,

   Piotr


On 7/2/19 4:38 AM, Syd Bauman wrote:

> I agree with Martin that there is quite a bit of utility in nesting
> <TEI>. But furthermore, I think there is a certain elegance to the
> plan, as it simplifies the TEI model writ large:
>
>     A TEI document is a <teiHeader> followed by one or more resource
>     things; if you want one of those resource things to have some of
>     its own metadata, wrap it in a <TEI> and insert a <teiHeader> in
>     front of it.
>
> Further, the change of allowing nesting <TEI> does not change the
> expressive power of the system at all. Note, e.g., that currently you
> can express a hierarchically nesting "document" structure using
> nested <teiCorpus>, e.g.:
>
>    <teiCorpus>
>      <teiHeader/>
>      <text/>
>      <teiCorpus>
>        <teiHeader/>
>        <TEI>
>          <teiHeader/>
>          <text/>
>        </TEI>
>        <TEI>
>          <teiHeader/>
>          <text/>
>        </TEI>
>      </teiCorpus>
>      <TEI>
>        <teiHeader/>
>        <text/>
>      </TEI>
>      <teiCorpus>
>        <teiHeader/>
>        <TEI>
>          <teiHeader/>
>          <text/>
>        </TEI>
>        <TEI>
>          <teiHeader/>
>          <text/>
>        </TEI>
>      </teiCorpus>
>    </teiCorpus>
>
> With nested <TEI> elements, the exact same structure can be expressed:
>
>    <TEI>
>      <teiHeader/>
>      <text/>
>      <TEI>
>        <teiHeader/>
>        <TEI>
>          <teiHeader/>
>          <text/>
>        </TEI>
>        <TEI>
>          <teiHeader/>
>          <text/>
>        </TEI>
>      </TEI>
>      <TEI>
>        <teiHeader/>
>        <text/>
>      </TEI>
>      <TEI>
>        <teiHeader/>
>        <TEI>
>          <teiHeader/>
>          <text/>
>        </TEI>
>        <TEI>
>          <teiHeader/>
>          <text/>
>        </TEI>
>      </TEI>
>    </TEI>
>
> with no loss in expressiveness at all. And this does solve the issue
> that the <soCalled>stand-offers</> found important: to explicitly and
> tightly associate metadata unique to a set of stand-off annotations
> with them. E.g.:
>
>    <TEI>
>      <teiHeader/>
>      <text/>
>      <TEI>
>        <teiHeader/>
>        <standOff>
>      </TEI>
>      <TEI>
>        <teiHeader/>
>        <standOff/>
>      </TEI>
>      <TEI>
>        <teiHeader/>
>        <standOff/>
>      </TEI>
>    </TEI>
>
>
> But that is all about the issue of <TEI> self-nesting. The main
> controversial issue (in my mind, at least) is whether or not to keep
> <teiCorpus> around once <TEI> can self-nest.
>
> There is some truth to Piotr's accusation that the removal of
> <teiCorpus> would be more for Council than for users. (Not really
> just Council, of course, but anyone who wants or needs to read,
> understand, edit, or play with the content model of <teiCorpus>; or
> those who have to process generic TEI documents.)
>
> The content model for <teiCorpus> in the "<TEI> self-nesting but
> <teiCorpus> also available" universe would be something like
>
>   element teiCorpus {
>             (
>               teiHeader,
>               (
>                 ( model.resourceLike+, ( TEI | teiCorpus )* )
>                 |
>                 ( TEI | teiCorpus )+
>               )
>             )
>           }
>
> But in the "<TEI> self-nesting" universe, whether <teiCorpus> and its
> somewhat complex content model is kept or not, the content model for <TEI>
> is simplicity itself:
>
>   element TEI { teiHeader, model.resourceLike+ }
>
> And I do see a mild advantage for users and for teachers of TEI in
> this clean simplicity: A <TEI> element is one or more resources
> explicitly associated with its (their) metadata.
>
> I also see a small advantage for those of us who write software that
> is supposed to handle a generic TEI document, rather than one from a
> specific project. As it stands, such software has to be prepared to
> handle either <TEI> or <teiCorpus> as the outermost element. If
> <teiCorpus> did not exist, such programs would become one small step
> easier to deal with.
>
> However, the argument against this practice (using <TEI> instead of
> <teiCorpus>) that Piotr foreshadows and Michael explicates is quite
> valid: if TEI were to drop <teiCorpus>, closed TEI schemas created
> with ODD will not be able to differentiate <TEI> elements that
> represent a corpus from those that represent a single document.[1]
>
> Open schemas (read "Schematron", including the Schematron embedded in
> an ODD), XSLT programs, XQuery programs, TEI processing models, and
> TEI expressions of default rendition will be able to quite easily
> differentiate <TEI> elements that represent a corpus from those that
> represent a single document. But, as Michael points out, it is the
> closed schema that we often use to help us create valid documents in
> the first place. (Note, too, that even in the case that <teiCorpus>
> exists either as it does now or alongside nesting <TEI>, a closed TEI
> schema created with ODD cannot differentiate a <teiCorpus> that is
> the outermost <teiCorpus> from one that is 3 levels deep.[1])
>
> So I think there are definitely pluses and minuses here. Personally,
> I think the pluses outweigh the minuses. But I don't write TEI
> documents that use <teiCorpus> by hand; the ex-post validation of an
> open schema is just as good to me as the ex-ante validation of a
> closed one. (And in addition, I could use Schematron Quick Fixes with
> the open schema.) Those who do, or would, take advantage of the
> ability to constrain <teiCorpus> differently than <TEI> in the closed
> schema would be adversely affected by removing <teiCorpus>. How many
> such folks are there, I wonder.
>
> Note
> ----
>   [1] These statements are not completely true. It is probably
>       *possible* to write an ODD (although not a PureODD) from which a
>       RELAX NG schema that can differentiate these things can be
>       generated. But it involves serious hacking with direct RELAX NG
>       code squirreled away in a manner that leaves your ODD somewhat
>       fragile, and does not afford the complete spectrum of
>       documentation capability that ODD offers. So I don't recommend
>       it.
>
> --
>   Syd Bauman, NRP  (he/him/his)
>   Senior XML Programmer/Analyst
>   Northeastern University Women Writers Project
>   [hidden email] or
>   [hidden email]
>
123