error in description of teiCorpus

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

error in description of teiCorpus

C. M. Sperberg-McQueen
The gloss of the teiCorpus element reads

    <teiCorpus> contains the whole of a TEI encoded corpus,
    comprising a single corpus header and one or more TEI
    elements, each containing a single text header and a text.

This was a true description in TEI P3 and TEI P4, but at
some point the content model was changed.  What follows
the header is no longer a sequence of ‘TEI’ elements
(and since the vocabulary can be modified, it is also not
necessarily a sequence of elements defined by the TEI,
so not necessarily ‘TEI elements’ in that sense).

The gloss should probably be changed to avoid stating
a falsehood.  Perhaps it would do just to say

    <teiCorpus> contains the whole of a TEI encoded corpus,
    comprising a single corpus header and one or more  
    elements representing part of the corpus, some of
    which may be teiCorpus or TEI elements with headers
    of their own.


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************
Reply | Threaded
Open this post in threaded view
|

Re: error in description of teiCorpus

Lou Burnard-6
It's true that the description doesn't match the content model now
permitted, but I am not sure that I like your proposed rewording, since
"one or more elements representing part of the corpus" seems to
contradict the use case which led to Council agreeing this change back
in 2013 (see https://sourceforge.net/p/tei/feature-requests/456/). My
reading of that discussion is that any model.resourceLike element which
is a direct child of teiCorpus should represent the whole of (some
aspect of) the corpus, which is not quite what the suggested phrase
connotes.

On 15/02/17 16:26, C. M. Sperberg-McQueen wrote:

> The gloss of the teiCorpus element reads
>
>      <teiCorpus> contains the whole of a TEI encoded corpus,
>      comprising a single corpus header and one or more TEI
>      elements, each containing a single text header and a text.
>
> This was a true description in TEI P3 and TEI P4, but at
> some point the content model was changed.  What follows
> the header is no longer a sequence of ‘TEI’ elements
> (and since the vocabulary can be modified, it is also not
> necessarily a sequence of elements defined by the TEI,
> so not necessarily ‘TEI elements’ in that sense).
>
> The gloss should probably be changed to avoid stating
> a falsehood.  Perhaps it would do just to say
>
>      <teiCorpus> contains the whole of a TEI encoded corpus,
>      comprising a single corpus header and one or more
>      elements representing part of the corpus, some of
>      which may be teiCorpus or TEI elements with headers
>      of their own.
>
>
> ********************************************
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> [hidden email]
> http://www.blackmesatech.com
> ********************************************
Reply | Threaded
Open this post in threaded view
|

Re: error in description of teiCorpus

C. M. Sperberg-McQueen
Fine with me; I hold no brief for my proposed wording, which I gather
is based on a misunderstanding of what a series of facsimile, fsdDecl,
sourceDoc, and text elements appearing as children of teiCorpus are
intended to be.  I’ll leave the task of formulating wording to those
who understand the intent, which despite Lou’s explanation I don’t.
Examples might help.

This may be a bigger problem than I realized at first.

Right now the function of fsdDecl, facsimile, sourceDoc, and text as
children of teiCorpus appears to be undocumented except in the
feature request to which Lou points; the descriptions of teiCorpus
(and language corpora in general) in chapters 15 and 4 of the
Guidelines say nothing about them, except to describe the teiCorpus
element as one teiHeader element followed by a series of ‘TEI’
elements, which entails that those elements are invalid as
children of teiCorpus.

It looks as if what was discussed and approved and implemented
was not a complete self-contained proposal that changes the
Guidelines from one self-consistent state to another self-consistent
state, but a proposal that entails an unspecified number of knock-on
changes to be worked out later: a promissory note with neither
amount nor due date filled in.

Perhaps those responsible for the maintenance and development of the
Guidelines should do as some standards development groups do, and
require that change proposals not introduce new contradictions into
the spec.  At a minimum, that means that if the change proposal calls
for changing the content model of an element, it also calls for
wording changes to every description of the element which reflects the
old content model but contradicts the new one.  Such a requirement has
the drawback of making it harder to prepare change proposals, and it
would make the process feel heavier and slower.  That's probably a
drawback.

But given that an interested user cannot now find out what a
/teiCorpus/facsimile element is supposed to mean or be, how much good
was achieved by the change to the content model?  Presumably those who
proposed the change are now able to use it while making their corpora
valid according to a vanilla TEI schema.  But since it's not possible
for a consumer of those corpora to find out what some things mean, we
seem to have lost one of the key promises to consumers of TEI-encoded
texts, which is that in any TEI-conforming document the meaning of the
markup language must be documented: either in the Guidelines, for
TEI-defined elements, or in the ODD file, for extensions and
modifications.


Good luck,

Michael

> On Feb 15, 2017, at 10:03 AM, Lou Burnard <[hidden email]> wrote:
>
> It's true that the description doesn't match the content model now permitted, but I am not sure that I like your proposed rewording, since "one or more elements representing part of the corpus" seems to contradict the use case which led to Council agreeing this change back in 2013 (see https://sourceforge.net/p/tei/feature-requests/456/). My reading of that discussion is that any model.resourceLike element which is a direct child of teiCorpus should represent the whole of (some aspect of) the corpus, which is not quite what the suggested phrase connotes.
>
> On 15/02/17 16:26, C. M. Sperberg-McQueen wrote:
>> The gloss of the teiCorpus element reads
>>
>>     <teiCorpus> contains the whole of a TEI encoded corpus,
>>     comprising a single corpus header and one or more TEI
>>     elements, each containing a single text header and a text.
>>
>> This was a true description in TEI P3 and TEI P4, but at
>> some point the content model was changed.  What follows
>> the header is no longer a sequence of ‘TEI’ elements
>> (and since the vocabulary can be modified, it is also not
>> necessarily a sequence of elements defined by the TEI,
>> so not necessarily ‘TEI elements’ in that sense).
>>
>> The gloss should probably be changed to avoid stating
>> a falsehood.  Perhaps it would do just to say
>>
>>     <teiCorpus> contains the whole of a TEI encoded corpus,
>>     comprising a single corpus header and one or more
>>     elements representing part of the corpus, some of
>>     which may be teiCorpus or TEI elements with headers
>>     of their own.
>>
>>
>> ********************************************
>> C. M. Sperberg-McQueen
>> Black Mesa Technologies LLC
>> [hidden email]
>> http://www.blackmesatech.com
>> ********************************************

********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************
Reply | Threaded
Open this post in threaded view
|

Re: error in description of teiCorpus

Hugh Cayless-2
It does look like we blew it on this one. Sebastian had a use case in mind and made the change to the content model without bothering to change the documentation. Looking at the original request, I have some sympathy for it, but I'm not convinced that it was the right way to solve the problem. Oh well...

We do try to keep the Guidelines prose and the ODD documentation in sync, but sometimes we fail. I'd like to think that nowadays, this sort of thing wouldn't proceed without discussion and input by the whole Council, which would mean it would get handled properly.

We'll have to try to sort out a sensible approach to dealing with this. Thanks for bringing it to our attention!

Hugh

On Wed, Feb 15, 2017 at 12:46 PM, C. M. Sperberg-McQueen <[hidden email]> wrote:
Fine with me; I hold no brief for my proposed wording, which I gather
is based on a misunderstanding of what a series of facsimile, fsdDecl,
sourceDoc, and text elements appearing as children of teiCorpus are
intended to be.  I’ll leave the task of formulating wording to those
who understand the intent, which despite Lou’s explanation I don’t.
Examples might help.

This may be a bigger problem than I realized at first.

Right now the function of fsdDecl, facsimile, sourceDoc, and text as
children of teiCorpus appears to be undocumented except in the
feature request to which Lou points; the descriptions of teiCorpus
(and language corpora in general) in chapters 15 and 4 of the
Guidelines say nothing about them, except to describe the teiCorpus
element as one teiHeader element followed by a series of ‘TEI’
elements, which entails that those elements are invalid as
children of teiCorpus.

It looks as if what was discussed and approved and implemented
was not a complete self-contained proposal that changes the
Guidelines from one self-consistent state to another self-consistent
state, but a proposal that entails an unspecified number of knock-on
changes to be worked out later: a promissory note with neither
amount nor due date filled in.

Perhaps those responsible for the maintenance and development of the
Guidelines should do as some standards development groups do, and
require that change proposals not introduce new contradictions into
the spec.  At a minimum, that means that if the change proposal calls
for changing the content model of an element, it also calls for
wording changes to every description of the element which reflects the
old content model but contradicts the new one.  Such a requirement has
the drawback of making it harder to prepare change proposals, and it
would make the process feel heavier and slower.  That's probably a
drawback.

But given that an interested user cannot now find out what a
/teiCorpus/facsimile element is supposed to mean or be, how much good
was achieved by the change to the content model?  Presumably those who
proposed the change are now able to use it while making their corpora
valid according to a vanilla TEI schema.  But since it's not possible
for a consumer of those corpora to find out what some things mean, we
seem to have lost one of the key promises to consumers of TEI-encoded
texts, which is that in any TEI-conforming document the meaning of the
markup language must be documented: either in the Guidelines, for
TEI-defined elements, or in the ODD file, for extensions and
modifications.


Good luck,

Michael

> On Feb 15, 2017, at 10:03 AM, Lou Burnard <[hidden email]> wrote:
>
> It's true that the description doesn't match the content model now permitted, but I am not sure that I like your proposed rewording, since "one or more elements representing part of the corpus" seems to contradict the use case which led to Council agreeing this change back in 2013 (see https://sourceforge.net/p/tei/feature-requests/456/). My reading of that discussion is that any model.resourceLike element which is a direct child of teiCorpus should represent the whole of (some aspect of) the corpus, which is not quite what the suggested phrase connotes.
>
> On 15/02/17 16:26, C. M. Sperberg-McQueen wrote:
>> The gloss of the teiCorpus element reads
>>
>>     <teiCorpus> contains the whole of a TEI encoded corpus,
>>     comprising a single corpus header and one or more TEI
>>     elements, each containing a single text header and a text.
>>
>> This was a true description in TEI P3 and TEI P4, but at
>> some point the content model was changed.  What follows
>> the header is no longer a sequence of ‘TEI’ elements
>> (and since the vocabulary can be modified, it is also not
>> necessarily a sequence of elements defined by the TEI,
>> so not necessarily ‘TEI elements’ in that sense).
>>
>> The gloss should probably be changed to avoid stating
>> a falsehood.  Perhaps it would do just to say
>>
>>     <teiCorpus> contains the whole of a TEI encoded corpus,
>>     comprising a single corpus header and one or more
>>     elements representing part of the corpus, some of
>>     which may be teiCorpus or TEI elements with headers
>>     of their own.
>>
>>
>> ********************************************
>> C. M. Sperberg-McQueen
>> Black Mesa Technologies LLC
>> [hidden email]
>> http://www.blackmesatech.com
>> ********************************************

********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************

Reply | Threaded
Open this post in threaded view
|

Re: error in description of teiCorpus

Robinson, Peter
In reply to this post by C. M. Sperberg-McQueen
On behalf of the organizing committee (Marek Slon, Greg Crane and myself), I recommend to everyone’s attention a two day workshop/conference convened by the Tadeusz Manteuffel Institute of History, Polish Academy of Sciences (IHPAN) in Warsaw this October.

Below is the conference call (closing on February 28). Conference participation may be by actual attendance in Warsaw, or online by streaming.  See also at www.atlasfontium.pl/edition2.0, Facebook profile https://www.facebook.com/historicalsourceedition2.0/, and Twitter (@edition2_0).

Best wishes

Peter Robinson

Call for Papers
International Conference


Historical source edition 2.0

Event: October 6–7, 2017
Abstract Due: February 28, 2017
Keywords: historical sources, historical editing, GIS, cartography, national heritage, on- line dissemination, digital humanities
Institution: Tadeusz Manteuffel Institute of History, Polish Academy of Sciences
Location: Warsaw, Poland; on-line worldwide
Organizational Committee: Gregory Crane, Peter Robinson, Marek Slon


Aims, scope, and eligibility

In the 19th and 20th centuries professional editing of historical sources was determined by printing technology, which dictated text as the only possible form of communication. The digital revolution has ended the monopoly of Gutenberg’s invention and has made it necessary to establish new principles in this field of knowledge. Methodology and techniques developed so far need to be adapted to contemporary trends. Although a significant number of valuable scientific editing projects have been implemented with the use of new technologies, there has been little general reflection on the impact of the new technologies. The main aim for our conference is to help to fulfil this gap in research.

We would like to invite – first of all – practitioners to participate in the conference:  project groups and scholars, who have great experience in editing historical records with tools beyond text and simple picture copying. Databases, Geographic Information Systems (GIS), linguistic analysis are definitely not a complete list of tools, methods, and useful techniques, therefore expansion into other topics would be a valuable outcome of the conference. We would encourage all contributors to formulate answers to the following questions:
‘What is the essence of historical source editing nowadays?’ ‘What are its constitutive features and basic functions?’.

Within this context, it would be necessary to address traditional definitions and practices. It is important to gain a broad perspective on this phenomenon, its past advances, current research, and a vision of the future. Presenting consequences of general assumptions used in contributors’ projects is considered as extremely valuable.

Conference venue, terms and conditions

The conference venue is in Warsaw, Poland and – especially – in the virtual world. The conference language is English. We would like to invite 10 speakers and approximately twice as many people considered as registered debaters. Some speakers and debaters may choose to be physically present in the Institute of History, Warsaw, or may present their papers and take part in debates online.
 
Abstracts

The deadline for submitting an abstract is February 28, 2017. Please send us a work-in- progress proposal (a title and an abstract 300–600 characters long). The Organizational Committee has 3 weeks to revise submitted abstracts and arrange the final programme of the conference.

Full papers

There is no need to send an advanced draft or a full paper before the conference, however we strongly recommend this option. Accepted papers can be shared with all registered participants and afterwards discussed during the conference. Full papers need to be submitted not later than December 15, 2017. Please remember that both versions of paper (a draft and a full paper) need to be readable within half hour, which means they need to be approximately 25.000 characters (or 4.000 words) long (including footnotes).
Conference

The conference will start on Friday, October 6, 2017, with the workshop. Professors Gregory Crane, Julio Escalona, Peter Robinson, and Marek Slon are going to present technical issues, and software used for their research. For more details, check out conference website.

On Saturday, October 7, 2017, full lectures and broad discussion are planned. There will also be enough time for a summary debate at the end of the conference.

The conference access is provided on-line via before chosen Voice over Internet Protocol software and Peer to Peer live video streaming system (all participants will be provided with technical details in May 2017 at latest).

Only registered participants are able to join the discussion actively. Online live streaming will be available for a broad audience. Moderated chat will be open to all viewers whose questions can be presented to the participants.

Publication

Conference papers will be published as an open access e-book with print on demand  option. A peer-reviewed book is going to be published by the Institute of History, Polish Academy of Sciences as a subsequent volume of the Institute publication series “Prace Atlasu Historycznego IH PAN”. Book will be licensed under the Creative Commons Attribution-ShareAlike, also known as CC BY-SA 3.0 (or later), and it will be available to download. Deadline for full papers is November 30, 2017. The book will be published in June 2018.

Visit in Warsaw

It is possible to join us at the conference venue in Warsaw, Poland. However, organizers are not able to cover travel, accommodation, or food expenses. Participants wishing to  take advantage of (low standard, simple furnished) guest rooms in the Institute of History should contact the conference secretary. The Institute is situated at the Market Square, where visitors can feel welcoming atmosphere of Warsaw Old Town. We would also like to invite our guests for dinner on Friday, October 6, 2017.
Two bursaries of 250 euros are available to registered students, to assist with travel to the conference.
 
Fees

There is no conference fee. Contributors are asked to invest something much more valuable before the conference – their time and attention.

Contact

Please do not hesitate to ask the conference secretary for details about this event. The person responsible for contact with participants is Ms. Wieslawa Duzy, PhD. Conference e- mail address is [hidden email].

Website and social media

For more information about the conference and our editing projects, check out our website at www.atlasfontium.pl/edition2.0, our Facebook profile https://www.facebook.com/historicalsourceedition2.0/, and follow us on Twitter (@edition2_0).