advantages of TEI

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

advantages of TEI

Eduard Drenth

Dear all,


At the moment we are converting some 8000 corpora (runes, old, mid and new frisian, dialects) at the fryske akademy to TEI.


By adopting an international standard we hope to boost doing research on top of these data. Related to this I am interested in experiences with:


1) doing research on TEI material

2) disclosure and publication (i.g. web) of TEI material


Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows minimal, disappointing results.


I did not yet try https://github.com/TEIC/Stylesheets, perhaps this offers better results.


I wonder how much effort / development (i.a. xslt, xquery) the two aspects (publicaton and doing research) will take.


TEI offers flexibility and freedom (i.g. <span type="lemma" target="w1 w2"> instead of <lemma target="w1 w2">) that complicates tool development. How big of a problem is this?



Eduard Drenth, Software Architekt


[hidden email]


Doelestrjitte 8

8911 DX  Ljouwert

+31 58 213 14 14

chat: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Magdalena Turska
Dear Eduard,

I will skip the 'advantages of TEI' part or we will be discussing well into New Year :-)

I will address the part on publication, urging you to investigate a bit into TEI Processing Model and in particular the TEI Publisher app on top of eXistdb. TEI Publisher can be found at http://showcases.exist-db.org/exist/apps/Showcases/index.html Other apps there[1]
were generated with TEI Publisher, some with only minimal (like setting up the website logo) intervention afterwards (eg EEBO app). If you want my personal opinion[2] it's probably the easiest option to set up an online publication of TEI documents at the moment. You will find the documentation for TEI Publisher which covers also the basics of TEI Processing Model here http://showcases.exist-db.org/exist/apps/tei-publisher/doc/documentation.xml?odd=documentation.odd

Best,

Magdalena

[1] with notable exception of Shakespeare's Plays - Plain XQuery Edition
[2] which may be biased since I had my part in this project

On 2 November 2016 at 11:46, Eduard Drenth <[hidden email]> wrote:

Dear all,


At the moment we are converting some 8000 corpora (runes, old, mid and new frisian, dialects) at the fryske akademy to TEI.


By adopting an international standard we hope to boost doing research on top of these data. Related to this I am interested in experiences with:


1) doing research on TEI material

2) disclosure and publication (i.g. web) of TEI material


Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows minimal, disappointing results.


I did not yet try https://github.com/TEIC/Stylesheets, perhaps this offers better results.


I wonder how much effort / development (i.a. xslt, xquery) the two aspects (publicaton and doing research) will take.


TEI offers flexibility and freedom (i.g. <span type="lemma" target="w1 w2"> instead of <lemma target="w1 w2">) that complicates tool development. How big of a problem is this?



Eduard Drenth, Software Architekt


[hidden email]


Doelestrjitte 8

8911 DX  Ljouwert

+31 58 213 14 14

chat: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Paul Broyles
In reply to this post by Eduard Drenth
Eduard,

Regarding the question of how flexibility and freedom complicate development, many projects develop internal encoding guidelines to provide consistency among their texts. Thus if you're developing a display system for the output of one particular project, you have a much more limited set of possibilities to aim for.

These documents can also be useful to provide to encoders who are relative TEI novices, as they can help explain how TEI encoding works in a more beginner-friendly way than the official documentation.

Here are a handful of examples I have handy and have found useful (the last is from a project I'm working on, though I had no hand in the guidelines):
There are many more examples out there, though unfortunately they can be hard to search for as they go under varied names, from "Transcriptional Protocols" to "Encoding Guidelines" to "Technical Introduction."

Paul

On Wed, Nov 2, 2016 at 7:56 AM Eduard Drenth <[hidden email]> wrote:

Dear all,


At the moment we are converting some 8000 corpora (runes, old, mid and new frisian, dialects) at the fryske akademy to TEI.


By adopting an international standard we hope to boost doing research on top of these data. Related to this I am interested in experiences with:


1) doing research on TEI material

2) disclosure and publication (i.g. web) of TEI material


Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows minimal, disappointing results.


I did not yet try https://github.com/TEIC/Stylesheets, perhaps this offers better results.


I wonder how much effort / development (i.a. xslt, xquery) the two aspects (publicaton and doing research) will take.


TEI offers flexibility and freedom (i.g. <span type="lemma" target="w1 w2"> instead of <lemma target="w1 w2">) that complicates tool development. How big of a problem is this?



Eduard Drenth, Software Architekt


[hidden email]


Doelestrjitte 8

8911 DX  Ljouwert

+31 58 213 14 14

chat: [hidden email]

--
Paul A. Broyles, Ph.D.
CLIR Postdoctoral Fellow in Data Curation for Medieval Studies
North Carolina State University

All electronic mail messages in connection with State business which are sent to or received by this account are subject to the NC Public Records Law and may be disclosed to third parties.
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Eduard Drenth
In reply to this post by Magdalena Turska

Thanks, I'll absolutely have a look into existdb....


Eduard Drenth, Software Architekt


[hidden email]


Doelestrjitte 8

8911 DX  Ljouwert

+31 58 213 14 14

chat: [hidden email]




From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Magdalena Turska <[hidden email]>
Sent: Wednesday, November 2, 2016 1:34 PM
To: [hidden email]
Subject: Re: advantages of TEI
 
Dear Eduard,

I will skip the 'advantages of TEI' part or we will be discussing well into New Year :-)

I will address the part on publication, urging you to investigate a bit into TEI Processing Model and in particular the TEI Publisher app on top of eXistdb. TEI Publisher can be found at http://showcases.exist-db.org/exist/apps/Showcases/index.html Other apps there[1]
were generated with TEI Publisher, some with only minimal (like setting up the website logo) intervention afterwards (eg EEBO app). If you want my personal opinion[2] it's probably the easiest option to set up an online publication of TEI documents at the moment. You will find the documentation for TEI Publisher which covers also the basics of TEI Processing Model here http://showcases.exist-db.org/exist/apps/tei-publisher/doc/documentation.xml?odd=documentation.odd

Best,

Magdalena

[1] with notable exception of Shakespeare's Plays - Plain XQuery Edition
[2] which may be biased since I had my part in this project

On 2 November 2016 at 11:46, Eduard Drenth <[hidden email]> wrote:

Dear all,


At the moment we are converting some 8000 corpora (runes, old, mid and new frisian, dialects) at the fryske akademy to TEI.


By adopting an international standard we hope to boost doing research on top of these data. Related to this I am interested in experiences with:


1) doing research on TEI material

2) disclosure and publication (i.g. web) of TEI material


Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows minimal, disappointing results.


I did not yet try https://github.com/TEIC/Stylesheets, perhaps this offers better results.


I wonder how much effort / development (i.a. xslt, xquery) the two aspects (publicaton and doing research) will take.


TEI offers flexibility and freedom (i.g. <span type="lemma" target="w1 w2"> instead of <lemma target="w1 w2">) that complicates tool development. How big of a problem is this?



Eduard Drenth, Software Architekt


[hidden email]


Doelestrjitte 8

8911 DX  Ljouwert

+31 58 213 14 14

chat: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Magdalena Turska
In reply to this post by Paul Broyles
Following on Paul's list, such project specific policies could be even called 'cheatsheets' as in Marjorie Burkhart's http://marjorie.burghart.online.fr/?q=en/content/tei-critical-apparatus-cheatsheet

Magdalena

On 2 November 2016 at 13:18, Paul Broyles <[hidden email]> wrote:
Eduard,

Regarding the question of how flexibility and freedom complicate development, many projects develop internal encoding guidelines to provide consistency among their texts. Thus if you're developing a display system for the output of one particular project, you have a much more limited set of possibilities to aim for.

These documents can also be useful to provide to encoders who are relative TEI novices, as they can help explain how TEI encoding works in a more beginner-friendly way than the official documentation.

Here are a handful of examples I have handy and have found useful (the last is from a project I'm working on, though I had no hand in the guidelines):
There are many more examples out there, though unfortunately they can be hard to search for as they go under varied names, from "Transcriptional Protocols" to "Encoding Guidelines" to "Technical Introduction."

Paul

On Wed, Nov 2, 2016 at 7:56 AM Eduard Drenth <[hidden email]> wrote:

Dear all,


At the moment we are converting some 8000 corpora (runes, old, mid and new frisian, dialects) at the fryske akademy to TEI.


By adopting an international standard we hope to boost doing research on top of these data. Related to this I am interested in experiences with:


1) doing research on TEI material

2) disclosure and publication (i.g. web) of TEI material


Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows minimal, disappointing results.


I did not yet try https://github.com/TEIC/Stylesheets, perhaps this offers better results.


I wonder how much effort / development (i.a. xslt, xquery) the two aspects (publicaton and doing research) will take.


TEI offers flexibility and freedom (i.g. <span type="lemma" target="w1 w2"> instead of <lemma target="w1 w2">) that complicates tool development. How big of a problem is this?



Eduard Drenth, Software Architekt


[hidden email]


Doelestrjitte 8

8911 DX  Ljouwert

+31 58 213 14 14

chat: [hidden email]

--
Paul A. Broyles, Ph.D.
CLIR Postdoctoral Fellow in Data Curation for Medieval Studies
North Carolina State University

All electronic mail messages in connection with State business which are sent to or received by this account are subject to the NC Public Records Law and may be disclosed to third parties.

Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Lou Burnard-6
In reply to this post by Paul Broyles
Might I also mention that the tei also provides a specific XML vocabulary for the creation of such specifications....

Sent from my Honor Mobile

-------- Original Message --------
Subject: Re: advantages of TEI
From: Paul Broyles
To: [hidden email]
CC:

Eduard,

Regarding the question of how flexibility and freedom complicate development, many projects develop internal encoding guidelines to provide consistency among their texts. Thus if you're developing a display system for the output of one particular project, you have a much more limited set of possibilities to aim for.

These documents can also be useful to provide to encoders who are relative TEI novices, as they can help explain how TEI encoding works in a more beginner-friendly way than the official documentation.

Here are a handful of examples I have handy and have found useful (the last is from a project I'm working on, though I had no hand in the guidelines):
There are many more examples out there, though unfortunately they can be hard to search for as they go under varied names, from "Transcriptional Protocols" to "Encoding Guidelines" to "Technical Introduction."

Paul

On Wed, Nov 2, 2016 at 7:56 AM Eduard Drenth <[hidden email]> wrote:

Dear all,


At the moment we are converting some 8000 corpora (runes, old, mid and new frisian, dialects) at the fryske akademy to TEI.


By adopting an international standard we hope to boost doing research on top of these data. Related to this I am interested in experiences with:


1) doing research on TEI material

2) disclosure and publication (i.g. web) of TEI material


Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows minimal, disappointing results.


I did not yet try https://github.com/TEIC/Stylesheets, perhaps this offers better results.


I wonder how much effort / development (i.a. xslt, xquery) the two aspects (publicaton and doing research) will take.


TEI offers flexibility and freedom (i.g. <span type="lemma" target="w1 w2"> instead of <lemma target="w1 w2">) that complicates tool development. How big of a problem is this?



Eduard Drenth, Software Architekt


[hidden email]


Doelestrjitte 8

8911 DX  Ljouwert

+31 58 213 14 14

chat: [hidden email]

--
Paul A. Broyles, Ph.D.
CLIR Postdoctoral Fellow in Data Curation for Medieval Studies
North Carolina State University

All electronic mail messages in connection with State business which are sent to or received by this account are subject to the NC Public Records Law and may be disclosed to third parties.
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Piotr Bański
In reply to this post by Magdalena Turska
A quick question to all: where in the TEI wiki structure would you
imagine this set of links?

@Paul: thanks for sharing! :-)

Best,

   Piotr

On 02/11/16 14:33, Magdalena Turska wrote:

> Following on Paul's list, such project specific policies could be even
> called 'cheatsheets' as in Marjorie
> Burkhart's http://marjorie.burghart.online.fr/?q=en/content/tei-critical-apparatus-cheatsheet
>
> Magdalena
>
> On 2 November 2016 at 13:18, Paul Broyles <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Eduard,
>
>     Regarding the question of how flexibility and freedom complicate
>     development, many projects develop internal encoding guidelines to
>     provide consistency among their texts. Thus if you're developing a
>     display system for the output of one particular project, you have a
>     much more limited set of possibilities to aim for.
>
>     These documents can also be useful to provide to encoders who are
>     relative TEI novices, as they can help explain how TEI encoding
>     works in a more beginner-friendly way than the official documentation.
>
>     Here are a handful of examples I have handy and have found useful
>     (the last is from a project I'm working on, though I had no hand in
>     the guidelines):
>     http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf
>     <http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf>
>     http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf
>     <http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf>
>     http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines
>     <http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines>
>     http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html
>     <http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html>
>
>     There are many more examples out there, though unfortunately they
>     can be hard to search for as they go under varied names, from
>     "Transcriptional Protocols" to "Encoding Guidelines" to "Technical
>     Introduction."
>
>     Paul
>
>     On Wed, Nov 2, 2016 at 7:56 AM Eduard Drenth
>     <[hidden email] <mailto:[hidden email]>> wrote:
>
>         Dear all,
>
>
>         At the moment we are converting some 8000 corpora (runes, old,
>         mid and new frisian, dialects) at the fryske akademy to TEI.
>
>
>         By adopting an international standard we hope to boost doing
>         research on top of these data. *Related to this I am interested
>         in experiences with:*
>
>
>         1) doing research on TEI material
>
>         2) disclosure and publication (i.g. web) of TEI material
>
>
>         Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows
>         minimal, disappointing results.
>
>
>         I did not yet try https://github.com/TEIC/Stylesheets
>         <https://github.com/TEIC/Stylesheets>, perhaps this offers
>         better results.
>
>
>         *I wonder how much effort / development (i.a. xslt, xquery) the
>         two aspects (publicaton and doing research) will take.*
>
>
>         TEI offers flexibility and freedom (i.g. <span type="lemma"
>         target="w1 w2"> instead of <lemma target="w1 w2">) that
>         complicates tool development. How big of a problem is this?
>
>
>
>         Eduard Drenth, Software Architekt
>
>
>         [hidden email] <mailto:[hidden email]>
>
>
>         Doelestrjitte 8
>
>         8911 DX  Ljouwert
>
>         +31 58 213 14 14
>
>         chat: [hidden email] <mailto:[hidden email]>
>
>     --
>     Paul A. Broyles, Ph.D.
>     CLIR Postdoctoral Fellow in Data Curation for Medieval Studies
>     North Carolina State University
>     [hidden email] <mailto:[hidden email]>
>
>     All electronic mail messages in connection with State business which
>     are sent to or received by this account are subject to the NC Public
>     Records Law and may be disclosed to third parties.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Lou Burnard-6
There is a tei projects page on the website which is where I would look for this kind of information.

Sent from my Honor Mobile

-------- Original Message --------
Subject: Re: advantages of TEI
From: Piotr Bański
To: [hidden email]
CC:

A quick question to all: where in the TEI wiki structure would you
imagine this set of links?

@Paul: thanks for sharing! :-)

Best,

   Piotr

On 02/11/16 14:33, Magdalena Turska wrote:
> Following on Paul's list, such project specific policies could be even
> called 'cheatsheets' as in Marjorie
> Burkhart's http://marjorie.burghart.online.fr/?q=en/content/tei-critical-apparatus-cheatsheet
>
> Magdalena
>
> On 2 November 2016 at 13:18, Paul Broyles <[hidden email]
> <[hidden email]>> wrote:
>
>     Eduard,
>
>     Regarding the question of how flexibility and freedom complicate
>     development, many projects develop internal encoding guidelines to
>     provide consistency among their texts. Thus if you're developing a
>     display system for the output of one particular project, you have a
>     much more limited set of possibilities to aim for.
>
>     These documents can also be useful to provide to encoders who are
>     relative TEI novices, as they can help explain how TEI encoding
>     works in a more beginner-friendly way than the official documentation.
>
>     Here are a handful of examples I have handy and have found useful
>     (the last is from a project I'm working on, though I had no hand in
>     the guidelines):
>     http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf
>     <http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf>
>     http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf
>     <http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf>
>     http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines
>     <http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines>
>     http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html
>     <http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html>
>
>     There are many more examples out there, though unfortunately they
>     can be hard to search for as they go under varied names, from
>     "Transcriptional Protocols" to "Encoding Guidelines" to "Technical
>     Introduction."
>
>     Paul
>
>     On Wed, Nov 2, 2016 at 7:56 AM Eduard Drenth
>     <[hidden email] <[hidden email]>> wrote:
>
>         Dear all,
>
>
>         At the moment we are converting some 8000 corpora (runes, old,
>         mid and new frisian, dialects) at the fryske akademy to TEI.
>
>
>         By adopting an international standard we hope to boost doing
>         research on top of these data. *Related to this I am interested
>         in experiences with:*
>
>
>         1) doing research on TEI material
>
>         2) disclosure and publication (i.g. web) of TEI material
>
>
>         Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows
>         minimal, disappointing results.
>
>
>         I did not yet try https://github.com/TEIC/Stylesheets
>         <https://github.com/TEIC/Stylesheets>, perhaps this offers
>         better results.
>
>
>         *I wonder how much effort / development (i.a. xslt, xquery) the
>         two aspects (publicaton and doing research) will take.*
>
>
>         TEI offers flexibility and freedom (i.g. <span type="lemma"
>         target="w1 w2"> instead of <lemma target="w1 w2">) that
>         complicates tool development. How big of a problem is this?
>
>
>
>         Eduard Drenth, Software Architekt
>
>
>         [hidden email] <[hidden email]>
>
>
>         Doelestrjitte 8
>
>         8911 DX  Ljouwert
>
>         +31 58 213 14 14
>
>         chat: [hidden email] <[hidden email]>
>
>     --
>     Paul A. Broyles, Ph.D.
>     CLIR Postdoctoral Fellow in Data Curation for Medieval Studies
>     North Carolina State University
>     [hidden email] <[hidden email]>
>
>     All electronic mail messages in connection with State business which
>     are sent to or received by this account are subject to the NC Public
>     Records Law and may be disclosed to third parties.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Gioele Barabucci-2
In reply to this post by Paul Broyles
On 02/11/2016 14:18, Paul Broyles wrote:

> These documents can also be useful to provide to encoders who are
> relative TEI novices, as they can help explain how TEI encoding works in
> a more beginner-friendly way than the official documentation.
>
> Here are a handful of examples I have handy and have found useful (the
> last is from a project I'm working on, though I had no hand in the
> guidelines):
> http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf
> http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf
> http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines
> http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html

Hello,

I'll take the chance to plug the encoding guidelines I am supervising at
the moment. Not even alpha version, but maybe useful to others. Comments
are welcome.

https://thomas-institut.github.io/averroes-tei/averroes-guidelines.html

The source code repository (100% TEI ODD) will be published in December.
Drop me a mail if you want a snapshot of it.

Regards,

--
Gioele Barabucci <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Martin Mueller
In reply to this post by Eduard Drenth

Dear Eduard,

 

With a project like yours you can get a lot of prompt, free, and excellent advice from the TEI List. It never lets you down.

 

As for things like <span type="lemma" target="w1 w2">, no technology will per se protect you from folly, and consistency is for the most part a matter of human attention, with some useful help from machines.   But the TEI gives you patterns to follow, and you can draw on the wealth of experience that projects all over the world have gathered and are more than willing to share.

 

MM

From: "TEI (Text Encoding Initiative) public discussion list" <[hidden email]> on behalf of Eduard Drenth <[hidden email]>
Reply-To: Eduard Drenth <[hidden email]>
Date: Wednesday, November 2, 2016 at 6:46 AM
To: "TEI (Text Encoding Initiative) public discussion list" <[hidden email]>
Subject: advantages of TEI

 

Dear all,

 

At the moment we are converting some 8000 corpora (runes, old, mid and new frisian, dialects) at the fryske akademy to TEI.

 

By adopting an international standard we hope to boost doing research on top of these data. Related to this I am interested in experiences with:

 

1) doing research on TEI material

2) disclosure and publication (i.g. web) of TEI material

 

Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows minimal, disappointing results.

 

I did not yet try https://github.com/TEIC/Stylesheets, perhaps this offers better results.

 

I wonder how much effort / development (i.a. xslt, xquery) the two aspects (publicaton and doing research) will take.

 

TEI offers flexibility and freedom (i.g. <span type="lemma" target="w1 w2"> instead of <lemma target="w1 w2">) that complicates tool development. How big of a problem is this?

 

 

Eduard Drenth, Software Architekt

 

[hidden email]

 

Doelestrjitte 8

8911 DX  Ljouwert

+31 58 213 14 14

chat: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Serge Heiden-2
In reply to this post by Eduard Drenth
Hi Eduard,

Le 02/11/2016 à 12:46, Eduard Drenth a écrit :
TEI offers flexibility and freedom (i.g. <span type="lemma" target="w1 w2"> instead of <lemma target="w1 w2">) that complicates tool development. How big of a problem is this?
From an IT perspective, working with TEI encoded texts is like catching chameleons hopping in XML trees.
If you work with people feeding chameleons, you can negotiate some synchronized convergence of colors.
Not necessarily the colors of the chameleons themselves (aka local encoding guidelines) but at least how
you are supposed to see them.
We develop a text analysis and publishing platform called TXM (http://textometrie.ens-lyon.fr/?lang=en)
with which we regularly use this strategy through XSLT adapters to help colleagues analyze and publish their TEI texts.
Often because projects tend to encode their texts before choosing a final analysis and publishing platform.
However, it is not easy to choose a TEI aware analysis and publishing platform (established software or adaptable framework) because it is not easy to specify what analyzing and reading mean.
A kind of "chicken or egg" dilemma, with chameleons...
See this tutorial for TXM TEI import strategy introduction and examples (sorry for French):
https://groupes.renater.fr/wiki/txm-users/public/tutoriels_import_xml-tei

Another strategy is to negotiate convergence to a manageable subset of TEI like TEI lite, tite, simple, zero...
(we are designing the latter for TXM work).

Best,
Serge

-- 
Dr. Serge Heiden, [hidden email], http://textometrie.ens-lyon.fr
ENS de Lyon/CNRS - IHRIM UMR5317
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Piotr Bański
In reply to this post by Lou Burnard-6
Hi Lou,

Of course I know the page, and of course I don't think it's very useful
for this kind of listing. Clickety-click through 80 sub-pages, not me... :-)

Cheers,

   P.

On 02/11/16 14:39, Lou Burnard wrote:

> There is a tei projects page on the website which is where I would look
> for this kind of information.
>
> Sent from my Honor Mobile
>
> -------- Original Message --------
> Subject: Re: advantages of TEI
> From: Piotr Bański
> To: [hidden email]
> CC:
>
> A quick question to all: where in the TEI wiki structure would you
> imagine this set of links?
>
> @Paul: thanks for sharing! :-)
>
> Best,
>
>    Piotr
>
> On 02/11/16 14:33, Magdalena Turska wrote:
>> Following on Paul's list, such project specific policies could be even
>> called 'cheatsheets' as in Marjorie
>> Burkhart's http://marjorie.burghart.online.fr/?q=en/content/tei-critical-apparatus-cheatsheet
>>
>> Magdalena
>>
>> On 2 November 2016 at 13:18, Paul Broyles <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Eduard,
>>
>>     Regarding the question of how flexibility and freedom complicate
>>     development, many projects develop internal encoding guidelines to
>>     provide consistency among their texts. Thus if you're developing a
>>     display system for the output of one particular project, you have a
>>     much more limited set of possibilities to aim for.
>>
>>     These documents can also be useful to provide to encoders who are
>>     relative TEI novices, as they can help explain how TEI encoding
>>     works in a more beginner-friendly way than the official documentation.
>>
>>     Here are a handful of examples I have handy and have found useful
>>     (the last is from a project I'm working on, though I had no hand in
>>     the guidelines):
>>     http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf
>>     <http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf>
>>     http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf
>>     <http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf>
>>     http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines
>>     <http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines>
>>     http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html
>>     <http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html>
>>
>>     There are many more examples out there, though unfortunately they
>>     can be hard to search for as they go under varied names, from
>>     "Transcriptional Protocols" to "Encoding Guidelines" to "Technical
>>     Introduction."
>>
>>     Paul
>>
>>     On Wed, Nov 2, 2016 at 7:56 AM Eduard Drenth
>>     <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>         Dear all,
>>
>>
>>         At the moment we are converting some 8000 corpora (runes, old,
>>         mid and new frisian, dialects) at the fryske akademy to TEI.
>>
>>
>>         By adopting an international standard we hope to boost doing
>>         research on top of these data. *Related to this I am interested
>>         in experiences with:*
>>
>>
>>         1) doing research on TEI material
>>
>>         2) disclosure and publication (i.g. web) of TEI material
>>
>>
>>         Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows
>>         minimal, disappointing results.
>>
>>
>>         I did not yet try https://github.com/TEIC/Stylesheets
>>         <https://github.com/TEIC/Stylesheets>, perhaps this offers
>>         better results.
>>
>>
>>         *I wonder how much effort / development (i.a. xslt, xquery) the
>>         two aspects (publicaton and doing research) will take.*
>>
>>
>>         TEI offers flexibility and freedom (i.g. <span type="lemma"
>>         target="w1 w2"> instead of <lemma target="w1 w2">) that
>>         complicates tool development. How big of a problem is this?
>>
>>
>>
>>         Eduard Drenth, Software Architekt
>>
>>
>>         [hidden email] <mailto:[hidden email]>
>>
>>
>>         Doelestrjitte 8
>>
>>         8911 DX  Ljouwert
>>
>>         +31 58 213 14 14
>>
>>         chat: [hidden email] <mailto:[hidden email]>
>>
>>     --
>>     Paul A. Broyles, Ph.D.
>>     CLIR Postdoctoral Fellow in Data Curation for Medieval Studies
>>     North Carolina State University
>>     [hidden email] <mailto:[hidden email]>
>>
>>     All electronic mail messages in connection with State business which
>>     are sent to or received by this account are subject to the NC Public
>>     Records Law and may be disclosed to third parties.
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Lou Burnard-6
You prefer a single page with 80 links you have to clicketty click to check if they're any use to you? Chain son gout...

Sent from my Honor Mobile

-------- Original Message --------
Subject: Re: advantages of TEI
From: Piotr Bański
To: Lou Burnard ,[hidden email]
CC:

Hi Lou,

Of course I know the page, and of course I don't think it's very useful
for this kind of listing. Clickety-click through 80 sub-pages, not me... :-)

Cheers,

   P.

On 02/11/16 14:39, Lou Burnard wrote:
> There is a tei projects page on the website which is where I would look
> for this kind of information.
>
> Sent from my Honor Mobile
>
> -------- Original Message --------
> Subject: Re: advantages of TEI
> From: Piotr Bański
> To: [hidden email]
> CC:
>
> A quick question to all: where in the TEI wiki structure would you
> imagine this set of links?
>
> @Paul: thanks for sharing! :-)
>
> Best,
>
>    Piotr
>
> On 02/11/16 14:33, Magdalena Turska wrote:
>> Following on Paul's list, such project specific policies could be even
>> called 'cheatsheets' as in Marjorie
>> Burkhart's http://marjorie.burghart.online.fr/?q=en/content/tei-critical-apparatus-cheatsheet
>>
>> Magdalena
>>
>> On 2 November 2016 at 13:18, Paul Broyles <[hidden email]
>> <[hidden email]>> wrote:
>>
>>     Eduard,
>>
>>     Regarding the question of how flexibility and freedom complicate
>>     development, many projects develop internal encoding guidelines to
>>     provide consistency among their texts. Thus if you're developing a
>>     display system for the output of one particular project, you have a
>>     much more limited set of possibilities to aim for.
>>
>>     These documents can also be useful to provide to encoders who are
>>     relative TEI novices, as they can help explain how TEI encoding
>>     works in a more beginner-friendly way than the official documentation.
>>
>>     Here are a handful of examples I have handy and have found useful
>>     (the last is from a project I'm working on, though I had no hand in
>>     the guidelines):
>>     http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf
>>     <http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf>
>>     http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf
>>     <http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf>
>>     http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines
>>     <http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines>
>>     http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html
>>     <http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html>
>>
>>     There are many more examples out there, though unfortunately they
>>     can be hard to search for as they go under varied names, from
>>     "Transcriptional Protocols" to "Encoding Guidelines" to "Technical
>>     Introduction."
>>
>>     Paul
>>
>>     On Wed, Nov 2, 2016 at 7:56 AM Eduard Drenth
>>     <[hidden email] <[hidden email]>> wrote:
>>
>>         Dear all,
>>
>>
>>         At the moment we are converting some 8000 corpora (runes, old,
>>         mid and new frisian, dialects) at the fryske akademy to TEI.
>>
>>
>>         By adopting an international standard we hope to boost doing
>>         research on top of these data. *Related to this I am interested
>>         in experiences with:*
>>
>>
>>         1) doing research on TEI material
>>
>>         2) disclosure and publication (i.g. web) of TEI material
>>
>>
>>         Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows
>>         minimal, disappointing results.
>>
>>
>>         I did not yet try https://github.com/TEIC/Stylesheets
>>         <https://github.com/TEIC/Stylesheets>, perhaps this offers
>>         better results.
>>
>>
>>         *I wonder how much effort / development (i.a. xslt, xquery) the
>>         two aspects (publicaton and doing research) will take.*
>>
>>
>>         TEI offers flexibility and freedom (i.g. <span type="lemma"
>>         target="w1 w2"> instead of <lemma target="w1 w2">) that
>>         complicates tool development. How big of a problem is this?
>>
>>
>>
>>         Eduard Drenth, Software Architekt
>>
>>
>>         [hidden email] <[hidden email]>
>>
>>
>>         Doelestrjitte 8
>>
>>         8911 DX  Ljouwert
>>
>>         +31 58 213 14 14
>>
>>         chat: [hidden email] <[hidden email]>
>>
>>     --
>>     Paul A. Broyles, Ph.D.
>>     CLIR Postdoctoral Fellow in Data Curation for Medieval Studies
>>     North Carolina State University
>>     [hidden email] <[hidden email]>
>>
>>     All electronic mail messages in connection with State business which
>>     are sent to or received by this account are subject to the NC Public
>>     Records Law and may be disclosed to third parties.
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

James Cummings-4
In reply to this post by Serge Heiden-2
Hi Eduard and Serge,

One of the approaches I think more generalistic TEI software
development should take is specification of input formats and
provide generalised conversions from tei_all to the subset that
the software does something useful with. (Ok, this is perhapsless
relevant for things like TXM, general editors, or database
frameworks.)  But if we imagine a new tool to display, visualize,
or process TEI there is no reason it should necessarily cope with
the whole of the TEI. It can use the TEI ODD customisation
language to specify a meta-schema that it can handle. (And as
Magdalena was noting including processing model information in
that so it could act as a sort of configuration file for that
processing.)  If your software won't do anything with <w>
elements and just ignore their existence then don't include them
in the schema and let people get errors/warnings about them.  If
you have a fixed list of @type attributes your software expects
on <name>, then document that in TEI ODD. And then through schema
errors or schematron warnings a user can test if their source
documents are processable by that bit of software. Even better if
there is then a tei_all to MySpecialSoftware conversion script
which throws away all the stuff this piece of software is going
to ignore or fail on.  I know the next question will be why are
we encoding it if we then throw it away -- and clearly the answer
is we may throw it away for _this_ bit of processing or
visualization or whatnot, but that doesn't mean it isn't crucial
for other bits of analysis and research.  So your TEI Zero, for
example, I can validate against its schema and if I don't get any
warnings then I know that your software won't have a problem with
it. If I do, I can judge if they are errors, or warnings like  --
"Your <name type="thingy"> will be treated as <name type="other">
in our software" -- then I can make an informed decision about
how the software will work with my texts or whether I should
convert them to match your values. I realise, of course, some
people already do this, but it may be worth reiterating as it
seems a lot more practical than people trying to develop software
that will cope with any of the TEI vocabulary (never mind new
things a project adds...).

-James


On 02/11/16 14:14, Serge Heiden wrote:

> Hi Eduard,
>
> Le 02/11/2016 à 12:46, Eduard Drenth a écrit :
>> TEI offers flexibility and freedom (i.g. <span type="lemma"
>> target="w1 w2"> instead of <lemma target="w1 w2">) that
>> complicates tool development. How big of a problem is this?
> From an IT perspective, working with TEI encoded texts is like
> catching chameleons hopping in XML trees.
> If you work with people feeding chameleons, you can negotiate
> some synchronized convergence of colors.
> Not necessarily the colors of the chameleons themselves (aka
> local encoding guidelines) but at least how
> you are supposed to see them.
> We develop a text analysis and publishing platform called TXM
> (http://textometrie.ens-lyon.fr/?lang=en)
> with which we regularly use this strategy through XSLT adapters
> to help colleagues analyze and publish their TEI texts.
> Often because projects tend to encode their texts before
> choosing a final analysis and publishing platform.
> However, it is not easy to choose a TEI aware analysis and
> publishing platform (established software or adaptable
> framework) because it is not easy to specify what analyzing and
> reading mean.
> A kind of "chicken or egg" dilemma, with chameleons...
> See this tutorial for TXM TEI import strategy introduction and
> examples (sorry for French):
> https://groupes.renater.fr/wiki/txm-users/public/tutoriels_import_xml-tei
>
> Another strategy is to negotiate convergence to a manageable
> subset of TEI like TEI lite, tite, simple, zero...
> (we are designing the latter for TXM work).
>
> Best,
> Serge
>
> --
> Dr. Serge Heiden,[hidden email],http://textometrie.ens-lyon.fr
> ENS de Lyon/CNRS - IHRIM UMR5317
> 15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883


--
Dr James Cummings, Academic IT Services, University of Oxford,
TEI Consultations: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Martin Holmes
I agree wholeheartedly with James here, but I also think that it would
be extremely beneficial if projects provided their XML not only in the
project's own format but also converted to a range of different, more
generic TEI schemas, as I described at DH 2015:

<http://dh2015.org/abstracts/xml/HOLMES_Martin_Whatever_Happened_to_Interchange_//HOLMES_Martin_Whatever_Happened_to_Interchange_.html>

and as we do at the Map of Early Modern London project:

<http://mapoflondon.uvic.ca/map.htm>

where if you click on "See XML", you'll find five versions you can
download. If you have a publication engine that handles (say) TEI Lite,
you should be able to grab a TEI Lite version of a project's XML and
push it into your publication engine. I believe it's preferable for all
sorts of reasons for the originating project to provide such
conversions, rather than forcing the end user to write them.

Cheers,
Martin

On 2016-11-02 09:49 AM, James Cummings wrote:

> Hi Eduard and Serge,
>
> One of the approaches I think more generalistic TEI software development
> should take is specification of input formats and provide generalised
> conversions from tei_all to the subset that the software does something
> useful with. (Ok, this is perhapsless relevant for things like TXM,
> general editors, or database frameworks.)  But if we imagine a new tool
> to display, visualize, or process TEI there is no reason it should
> necessarily cope with the whole of the TEI. It can use the TEI ODD
> customisation language to specify a meta-schema that it can handle. (And
> as Magdalena was noting including processing model information in that
> so it could act as a sort of configuration file for that processing.)
> If your software won't do anything with <w> elements and just ignore
> their existence then don't include them in the schema and let people get
> errors/warnings about them.  If you have a fixed list of @type
> attributes your software expects on <name>, then document that in TEI
> ODD. And then through schema errors or schematron warnings a user can
> test if their source documents are processable by that bit of software.
> Even better if there is then a tei_all to MySpecialSoftware conversion
> script which throws away all the stuff this piece of software is going
> to ignore or fail on.  I know the next question will be why are we
> encoding it if we then throw it away -- and clearly the answer is we may
> throw it away for _this_ bit of processing or visualization or whatnot,
> but that doesn't mean it isn't crucial for other bits of analysis and
> research.  So your TEI Zero, for example, I can validate against its
> schema and if I don't get any warnings then I know that your software
> won't have a problem with it. If I do, I can judge if they are errors,
> or warnings like  -- "Your <name type="thingy"> will be treated as <name
> type="other"> in our software" -- then I can make an informed decision
> about how the software will work with my texts or whether I should
> convert them to match your values. I realise, of course, some people
> already do this, but it may be worth reiterating as it seems a lot more
> practical than people trying to develop software that will cope with any
> of the TEI vocabulary (never mind new things a project adds...).
>
> -James
>
>
> On 02/11/16 14:14, Serge Heiden wrote:
>> Hi Eduard,
>>
>> Le 02/11/2016 à 12:46, Eduard Drenth a écrit :
>>> TEI offers flexibility and freedom (i.g. <span type="lemma"
>>> target="w1 w2"> instead of <lemma target="w1 w2">) that complicates
>>> tool development. How big of a problem is this?
>> From an IT perspective, working with TEI encoded texts is like
>> catching chameleons hopping in XML trees.
>> If you work with people feeding chameleons, you can negotiate some
>> synchronized convergence of colors.
>> Not necessarily the colors of the chameleons themselves (aka local
>> encoding guidelines) but at least how
>> you are supposed to see them.
>> We develop a text analysis and publishing platform called TXM
>> (http://textometrie.ens-lyon.fr/?lang=en)
>> with which we regularly use this strategy through XSLT adapters to
>> help colleagues analyze and publish their TEI texts.
>> Often because projects tend to encode their texts before choosing a
>> final analysis and publishing platform.
>> However, it is not easy to choose a TEI aware analysis and publishing
>> platform (established software or adaptable framework) because it is
>> not easy to specify what analyzing and reading mean.
>> A kind of "chicken or egg" dilemma, with chameleons...
>> See this tutorial for TXM TEI import strategy introduction and
>> examples (sorry for French):
>> https://groupes.renater.fr/wiki/txm-users/public/tutoriels_import_xml-tei
>>
>> Another strategy is to negotiate convergence to a manageable subset of
>> TEI like TEI lite, tite, simple, zero...
>> (we are designing the latter for TXM work).
>>
>> Best,
>> Serge
>>
>> --
>> Dr. Serge Heiden,[hidden email],http://textometrie.ens-lyon.fr
>> ENS de Lyon/CNRS - IHRIM UMR5317
>> 15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883
>
>
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Piotr Bański
In reply to this post by Lou Burnard-6
We're talking about this page:

http://www.tei-c.org/Activities/Projects/

and the sub-thread began with a bunch of links to project-level
guidelines. You're not going to find these links on any of the subpages
of Projects/

So the quick thought was: someone has done the tough job of looking
through those projects and finding their local guidelines. It might
(maybe) make sense to keep the results somewhere in a dynamic setting,
where it can either grow into an even more useful list or.. not grow,
and in the latter case no one is going to cry about that. The Projects/
page, useful for PR purposes perhaps, is not very useful as a quick
source of local guidelines. (And note that I abstract away from whether
these local guidelines are good/useful/whathaveyou. That's another
thing, but whether we end up classifying them as useful or not, and
maybe even noticing valuable or unwanted tendencies in preparing local
guidelines, it's rather good to _have_ those guidelines at a distance of
a single click).

But I'm not sure where to put them at a minimal time investment on my
part, to make them maximally useful to others. Hence my initial question.

Best,

   P.


On 02/11/16 17:34, Lou Burnard wrote:

> You prefer a single page with 80 links you have to clicketty click to
> check if they're any use to you? Chain son gout...
>
> Sent from my Honor Mobile
>
> -------- Original Message --------
> Subject: Re: advantages of TEI
> From: Piotr Bański
> To: Lou Burnard ,[hidden email]
> CC:
>
> Hi Lou,
>
> Of course I know the page, and of course I don't think it's very useful
> for this kind of listing. Clickety-click through 80 sub-pages, not me... :-)
>
> Cheers,
>
>    P.
>
> On 02/11/16 14:39, Lou Burnard wrote:
>> There is a tei projects page on the website which is where I would look
>> for this kind of information.
>>
>> Sent from my Honor Mobile
>>
>> -------- Original Message --------
>> Subject: Re: advantages of TEI
>> From: Piotr Bański
>> To: [hidden email]
>> CC:
>>
>> A quick question to all: where in the TEI wiki structure would you
>> imagine this set of links?
>>
>> @Paul: thanks for sharing! :-)
>>
>> Best,
>>
>>    Piotr
>>
>> On 02/11/16 14:33, Magdalena Turska wrote:
>>> Following on Paul's list, such project specific policies could be even
>>> called 'cheatsheets' as in Marjorie
>>> Burkhart's http://marjorie.burghart.online.fr/?q=en/content/tei-critical-apparatus-cheatsheet
>>>
>>> Magdalena
>>>
>>> On 2 November 2016 at 13:18, Paul Broyles <[hidden email]
>>> <mailto:[hidden email]>> wrote:
>>>
>>>     Eduard,
>>>
>>>     Regarding the question of how flexibility and freedom complicate
>>>     development, many projects develop internal encoding guidelines to
>>>     provide consistency among their texts. Thus if you're developing a
>>>     display system for the output of one particular project, you have a
>>>     much more limited set of possibilities to aim for.
>>>
>>>     These documents can also be useful to provide to encoders who are
>>>     relative TEI novices, as they can help explain how TEI encoding
>>>     works in a more beginner-friendly way than the official documentation.
>>>
>>>     Here are a handful of examples I have handy and have found useful
>>>     (the last is from a project I'm working on, though I had no hand in
>>>     the guidelines):
>>>     http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf
>>>     <http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf>
>>>     http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf
>>>     <http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf>
>>>     http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines
>>>     <http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines>
>>>     http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html
>>>     <http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html>
>>>
>>>     There are many more examples out there, though unfortunately they
>>>     can be hard to search for as they go under varied names, from
>>>     "Transcriptional Protocols" to "Encoding Guidelines" to "Technical
>>>     Introduction."
>>>
>>>     Paul
>>>
>>>     On Wed, Nov 2, 2016 at 7:56 AM Eduard Drenth
>>>     <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>>         Dear all,
>>>
>>>
>>>         At the moment we are converting some 8000 corpora (runes, old,
>>>         mid and new frisian, dialects) at the fryske akademy to TEI.
>>>
>>>
>>>         By adopting an international standard we hope to boost doing
>>>         research on top of these data. *Related to this I am interested
>>>         in experiences with:*
>>>
>>>
>>>         1) doing research on TEI material
>>>
>>>         2) disclosure and publication (i.g. web) of TEI material
>>>
>>>
>>>         Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows
>>>         minimal, disappointing results.
>>>
>>>
>>>         I did not yet try https://github.com/TEIC/Stylesheets
>>>         <https://github.com/TEIC/Stylesheets>, perhaps this offers
>>>         better results.
>>>
>>>
>>>         *I wonder how much effort / development (i.a. xslt, xquery) the
>>>         two aspects (publicaton and doing research) will take.*
>>>
>>>
>>>         TEI offers flexibility and freedom (i.g. <span type="lemma"
>>>         target="w1 w2"> instead of <lemma target="w1 w2">) that
>>>         complicates tool development. How big of a problem is this?
>>>
>>>
>>>
>>>         Eduard Drenth, Software Architekt
>>>
>>>
>>>         [hidden email] <mailto:[hidden email]>
>>>
>>>
>>>         Doelestrjitte 8
>>>
>>>         8911 DX  Ljouwert
>>>
>>>         +31 58 213 14 14
>>>
>>>         chat: [hidden email] <mailto:[hidden email]>
>>>
>>>     --
>>>     Paul A. Broyles, Ph.D.
>>>     CLIR Postdoctoral Fellow in Data Curation for Medieval Studies
>>>     North Carolina State University
>>>     [hidden email] <mailto:[hidden email]>
>>>
>>>     All electronic mail messages in connection with State business which
>>>     are sent to or received by this account are subject to the NC Public
>>>     Records Law and may be disclosed to third parties.
>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Piotr Bański
In reply to this post by Martin Holmes
And I agree wholeheartedly with your statement about the designated
generic output format being a responsibility of individual projects, and
will be happy to reread your paper (thanks).

I would even venture further:
* (dangerous) For some projects, the promise providing export to at
least one TEI-blessed formats (even if it ends up somewhat lossy),
should be a pre-condition for financing.

* (not dangerous) An online validator for those special generic formats
(in their vanilla versions) would be great to have.

Best,

   P.

On 02/11/16 18:11, Martin Holmes wrote:

> I agree wholeheartedly with James here, but I also think that it would
> be extremely beneficial if projects provided their XML not only in the
> project's own format but also converted to a range of different, more
> generic TEI schemas, as I described at DH 2015:
>
> <http://dh2015.org/abstracts/xml/HOLMES_Martin_Whatever_Happened_to_Interchange_//HOLMES_Martin_Whatever_Happened_to_Interchange_.html>
>
>
> and as we do at the Map of Early Modern London project:
>
> <http://mapoflondon.uvic.ca/map.htm>
>
> where if you click on "See XML", you'll find five versions you can
> download. If you have a publication engine that handles (say) TEI Lite,
> you should be able to grab a TEI Lite version of a project's XML and
> push it into your publication engine. I believe it's preferable for all
> sorts of reasons for the originating project to provide such
> conversions, rather than forcing the end user to write them.
>
> Cheers,
> Martin
>
> On 2016-11-02 09:49 AM, James Cummings wrote:
>> Hi Eduard and Serge,
>>
>> One of the approaches I think more generalistic TEI software development
>> should take is specification of input formats and provide generalised
>> conversions from tei_all to the subset that the software does something
>> useful with. (Ok, this is perhapsless relevant for things like TXM,
>> general editors, or database frameworks.)  But if we imagine a new tool
>> to display, visualize, or process TEI there is no reason it should
>> necessarily cope with the whole of the TEI. It can use the TEI ODD
>> customisation language to specify a meta-schema that it can handle. (And
>> as Magdalena was noting including processing model information in that
>> so it could act as a sort of configuration file for that processing.)
>> If your software won't do anything with <w> elements and just ignore
>> their existence then don't include them in the schema and let people get
>> errors/warnings about them.  If you have a fixed list of @type
>> attributes your software expects on <name>, then document that in TEI
>> ODD. And then through schema errors or schematron warnings a user can
>> test if their source documents are processable by that bit of software.
>> Even better if there is then a tei_all to MySpecialSoftware conversion
>> script which throws away all the stuff this piece of software is going
>> to ignore or fail on.  I know the next question will be why are we
>> encoding it if we then throw it away -- and clearly the answer is we may
>> throw it away for _this_ bit of processing or visualization or whatnot,
>> but that doesn't mean it isn't crucial for other bits of analysis and
>> research.  So your TEI Zero, for example, I can validate against its
>> schema and if I don't get any warnings then I know that your software
>> won't have a problem with it. If I do, I can judge if they are errors,
>> or warnings like  -- "Your <name type="thingy"> will be treated as <name
>> type="other"> in our software" -- then I can make an informed decision
>> about how the software will work with my texts or whether I should
>> convert them to match your values. I realise, of course, some people
>> already do this, but it may be worth reiterating as it seems a lot more
>> practical than people trying to develop software that will cope with any
>> of the TEI vocabulary (never mind new things a project adds...).
>>
>> -James
>>
>>
>> On 02/11/16 14:14, Serge Heiden wrote:
>>> Hi Eduard,
>>>
>>> Le 02/11/2016 à 12:46, Eduard Drenth a écrit :
>>>> TEI offers flexibility and freedom (i.g. <span type="lemma"
>>>> target="w1 w2"> instead of <lemma target="w1 w2">) that complicates
>>>> tool development. How big of a problem is this?
>>> From an IT perspective, working with TEI encoded texts is like
>>> catching chameleons hopping in XML trees.
>>> If you work with people feeding chameleons, you can negotiate some
>>> synchronized convergence of colors.
>>> Not necessarily the colors of the chameleons themselves (aka local
>>> encoding guidelines) but at least how
>>> you are supposed to see them.
>>> We develop a text analysis and publishing platform called TXM
>>> (http://textometrie.ens-lyon.fr/?lang=en)
>>> with which we regularly use this strategy through XSLT adapters to
>>> help colleagues analyze and publish their TEI texts.
>>> Often because projects tend to encode their texts before choosing a
>>> final analysis and publishing platform.
>>> However, it is not easy to choose a TEI aware analysis and publishing
>>> platform (established software or adaptable framework) because it is
>>> not easy to specify what analyzing and reading mean.
>>> A kind of "chicken or egg" dilemma, with chameleons...
>>> See this tutorial for TXM TEI import strategy introduction and
>>> examples (sorry for French):
>>> https://groupes.renater.fr/wiki/txm-users/public/tutoriels_import_xml-tei
>>>
>>>
>>> Another strategy is to negotiate convergence to a manageable subset of
>>> TEI like TEI lite, tite, simple, zero...
>>> (we are designing the latter for TXM work).
>>>
>>> Best,
>>> Serge
>>>
>>> --
>>> Dr. Serge Heiden,[hidden email],http://textometrie.ens-lyon.fr
>>> ENS de Lyon/CNRS - IHRIM UMR5317
>>> 15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Elisa Beshero-Bondar
In reply to this post by Piotr Bański
Hi Piotr,
That page badly needs to be updated, too, and by each of the project maintainers--I keep forgetting to update the links to my projects there! 

We've been talking in Council and with Kevin Hawkins (who's working on a new incarnation of the TEI website) that we really should have a page that guides project designers with sample ODD documentations geared to specific kinds of material (e.g. CBML, an ODD developed for projects that digitally model comic books, and other ODD customizations that seem useful starting points ). Helping people to read and review project customizations would be the goal. Would a page featuring interesting and useful ODD customizations be an improvement?

Elisa

On Wed, Nov 2, 2016 at 1:26 PM, Piotr Bański <[hidden email]> wrote:
We're talking about this page:

http://www.tei-c.org/Activities/Projects/

and the sub-thread began with a bunch of links to project-level guidelines. You're not going to find these links on any of the subpages of Projects/

So the quick thought was: someone has done the tough job of looking through those projects and finding their local guidelines. It might (maybe) make sense to keep the results somewhere in a dynamic setting, where it can either grow into an even more useful list or.. not grow, and in the latter case no one is going to cry about that. The Projects/ page, useful for PR purposes perhaps, is not very useful as a quick source of local guidelines. (And note that I abstract away from whether these local guidelines are good/useful/whathaveyou. That's another thing, but whether we end up classifying them as useful or not, and maybe even noticing valuable or unwanted tendencies in preparing local guidelines, it's rather good to _have_ those guidelines at a distance of a single click).

But I'm not sure where to put them at a minimal time investment on my part, to make them maximally useful to others. Hence my initial question.

Best,

  P.



On 02/11/16 17:34, Lou Burnard wrote:
You prefer a single page with 80 links you have to clicketty click to
check if they're any use to you? Chain son gout...

Sent from my Honor Mobile

-------- Original Message --------
Subject: Re: advantages of TEI
From: Piotr Bański
To: Lou Burnard ,[hidden email]
CC:

Hi Lou,

Of course I know the page, and of course I don't think it's very useful
for this kind of listing. Clickety-click through 80 sub-pages, not me... :-)

Cheers,

   P.

On 02/11/16 14:39, Lou Burnard wrote:
There is a tei projects page on the website which is where I would look
for this kind of information.

Sent from my Honor Mobile

-------- Original Message --------
Subject: Re: advantages of TEI
From: Piotr Bański
To: [hidden email]
CC:

A quick question to all: where in the TEI wiki structure would you
imagine this set of links?

@Paul: thanks for sharing! :-)

Best,

   Piotr

On 02/11/16 14:33, Magdalena Turska wrote:
Following on Paul's list, such project specific policies could be even
called 'cheatsheets' as in Marjorie
Burkhart's http://marjorie.burghart.online.fr/?q=en/content/tei-critical-apparatus-cheatsheet

Magdalena

On 2 November 2016 at 13:18, Paul Broyles <[hidden email]
<mailto:[hidden email]>> wrote:

    Eduard,

    Regarding the question of how flexibility and freedom complicate
    development, many projects develop internal encoding guidelines to
    provide consistency among their texts. Thus if you're developing a
    display system for the output of one particular project, you have a
    much more limited set of possibilities to aim for.

    These documents can also be useful to provide to encoders who are
    relative TEI novices, as they can help explain how TEI encoding
    works in a more beginner-friendly way than the official documentation.

    Here are a handful of examples I have handy and have found useful
    (the last is from a project I'm working on, though I had no hand in
    the guidelines):
    http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf
    <http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf>
    http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf
    <http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf>
    http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines
    <http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines>
    http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html
    <http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html>

    There are many more examples out there, though unfortunately they
    can be hard to search for as they go under varied names, from
    "Transcriptional Protocols" to "Encoding Guidelines" to "Technical
    Introduction."

    Paul

    On Wed, Nov 2, 2016 at 7:56 AM Eduard Drenth
    <[hidden email] <mailto:[hidden email]>> wrote:

        Dear all,


        At the moment we are converting some 8000 corpora (runes, old,
        mid and new frisian, dialects) at the fryske akademy to TEI.


        By adopting an international standard we hope to boost doing
        research on top of these data. *Related to this I am interested
        in experiences with:*


        1) doing research on TEI material

        2) disclosure and publication (i.g. web) of TEI material


        Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows
        minimal, disappointing results.


        I did not yet try https://github.com/TEIC/Stylesheets
        <https://github.com/TEIC/Stylesheets>, perhaps this offers
        better results.


        *I wonder how much effort / development (i.a. xslt, xquery) the
        two aspects (publicaton and doing research) will take.*


        TEI offers flexibility and freedom (i.g. <span type="lemma"
        target="w1 w2"> instead of <lemma target="w1 w2">) that
        complicates tool development. How big of a problem is this?



        Eduard Drenth, Software Architekt


        [hidden email] <mailto:[hidden email]>


        Doelestrjitte 8

        8911 DX  Ljouwert

        <a href="tel:%2B31%2058%20213%2014%2014" value="+31582131414" target="_blank">+31 58 213 14 14

        chat: [hidden email] <mailto:[hidden email]>

    --
    Paul A. Broyles, Ph.D.
    CLIR Postdoctoral Fellow in Data Curation for Medieval Studies
    North Carolina State University
    [hidden email] <mailto:[hidden email]>

    All electronic mail messages in connection with State business which
    are sent to or received by this account are subject to the NC Public
    Records Law and may be disclosed to third parties.





--
Elisa Beshero-Bondar, PhD
Director, Center for the Digital Text | Associate Professor of English
University of Pittsburgh at Greensburg | Humanities Division
150 Finoli Drive
Greensburg, PA  15601  USA
E-mail:[hidden email]
Development site: http://newtfire.org
Reply | Threaded
Open this post in threaded view
|

Re: advantages of TEI

Serge Heiden-2
In reply to this post by James Cummings-4
Hi James,

Le 02/11/2016 à 17:49, James Cummings a écrit :
One of the approaches I think more generalistic TEI software development should take is specification of input formats and provide generalised conversions from tei_all to the subset that the software does something useful with. (Ok, this is perhapsless relevant for things like TXM, general editors, or database frameworks.)  But if we imagine a new tool to display, visualize, or process TEI there is no reason it should necessarily cope with the whole of the TEI. It can use the TEI ODD customisation language to specify a meta-schema that it can handle. (And as Magdalena was noting including processing model information in that so it could act as a sort of configuration file for that processing.)
TEI simple is a step in that direction, for a specific processing model, of which we could perhaps augment the semantics.
What is difficult for us is:
1) to known our own processing model (we also have chameleons on our side...)
2) to represent our own processing model in such a container.
If your software won't do anything with <w> elements and just ignore their existence then don't include them in the schema and let people get errors/warnings about them.
If you have a fixed list of @type attributes your software expects on <name>, then document that in TEI ODD. And then through schema errors or schematron warnings a user can test if their source documents are processable by that bit of software. Even better if there is then a tei_all to MySpecialSoftware conversion script which throws away all the stuff this piece of software is going to ignore or fail on.
Currently in TXM we allow you to use <w> or not:
a- all words are encoded by <w>s (generally some tools have probably already done some work on a previous representation of the text)
b- or some words can be encoded by <w>s (very useful for progressive lazy encoding strategy: encoding/analysis cycle)
c- or no word is encoded by <w> (the most frequent case)
b) is the most difficult case to process, but for a) we still have to decide a decoding strategy to get the word form and other properties and for c) it is very often still difficult to decide the 'base text' character stream we must rely on to build words.
We also have to decide for which textual plane to index the words found in the stream: for the titles plane, didascalies plane, notes plane, body plane, etc.
So it is not only a matter of expressing constraints on the input representation but also to tune flexibility for its processing.
The constraints can also rely on information expressed outside of the representation, for example if the <w> encode morpho-syntactic information produced by a specific NLP tool we may need to check constraints on the m-s tagset which is very rarely defined in the teiHeader or somewhere in the text (typically because NLP tools change and rarely declare formally their outputs).
Another example is expressing constraints on encodings of ontological information in texts linking to ontology repositories.

I know the next question will be why are we encoding it if we then throw it away -- and clearly the answer is we may throw it away for _this_ bit of processing or visualization or whatnot, but that doesn't mean it isn't crucial for other bits of analysis and research.  So your TEI Zero, for example, I can validate against its schema and if I don't get any warnings then I know that your software won't have a problem with it. If I do, I can judge if they are errors, or warnings like  -- "Your <name type="thingy"> will be treated as <name type="other"> in our software" -- then I can make an informed decision about how the software will work with my texts or whether I should convert them to match your values. I realise, of course, some people already do this, but it may be worth reiterating as it seems a lot more practical than people trying to develop software that will cope with any of the TEI vocabulary (never mind new things a project adds...).
TEI vocabulary is fine, it is the flexibility of its usage which is difficult to cope with.
Currently what we call 'TEI Zero' in the TXM universe has no format. It is just an under-constrained specification of a tiny TEI encodings subset we have processed the most in recent years for analysis and publishing and we want to help people work with.

-slh


-- 
Dr. Serge Heiden, [hidden email], http://textometrie.ens-lyon.fr
ENS de Lyon/CNRS - IHRIM UMR5317
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883
Reply | Threaded
Open this post in threaded view
|

FW: advantages of TEI

Lou Burnard-6
In reply to this post by Elisa Beshero-Bondar
Back in the day., when the tei "projects" page was first created, it was a web form which projects could use to update their own descriptions, the thinking being that they would be best placed to document their own activities, and that it wasn't the TEI's job to collect, still less to curate, their activities. So while I totally agree that that list of links to project manuals is a very useful resource, and that it would be desirable to see it maintained and expanded (I think I even suggested we should somehow try to make an ODD library somewhere) I remain cautious of making this into a new TEI job, except insofaras we are all "the TEI" of course.

May I alsoi say that I am rather unimpressed with the current state of the TEI Wiki?  It's full of dead links and unfulfilled promises, and there doesn't seem to be any mechanism for improving it.



From: TEI (Text Encoding Initiative) public discussion list [[hidden email]] on behalf of Elisa Beshero-Bondar [[hidden email]]
Sent: 02 November 2016 17:39
To: [hidden email]
Subject: Re: advantages of TEI

Hi Piotr,
That page badly needs to be updated, too, and by each of the project maintainers--I keep forgetting to update the links to my projects there! 

We've been talking in Council and with Kevin Hawkins (who's working on a new incarnation of the TEI website) that we really should have a page that guides project designers with sample ODD documentations geared to specific kinds of material (e.g. CBML, an ODD developed for projects that digitally model comic books, and other ODD customizations that seem useful starting points ). Helping people to read and review project customizations would be the goal. Would a page featuring interesting and useful ODD customizations be an improvement?

Elisa

On Wed, Nov 2, 2016 at 1:26 PM, Piotr Bański <[hidden email]> wrote:
We're talking about this page:

http://www.tei-c.org/Activities/Projects/

and the sub-thread began with a bunch of links to project-level guidelines. You're not going to find these links on any of the subpages of Projects/

So the quick thought was: someone has done the tough job of looking through those projects and finding their local guidelines. It might (maybe) make sense to keep the results somewhere in a dynamic setting, where it can either grow into an even more useful list or.. not grow, and in the latter case no one is going to cry about that. The Projects/ page, useful for PR purposes perhaps, is not very useful as a quick source of local guidelines. (And note that I abstract away from whether these local guidelines are good/useful/whathaveyou. That's another thing, but whether we end up classifying them as useful or not, and maybe even noticing valuable or unwanted tendencies in preparing local guidelines, it's rather good to _have_ those guidelines at a distance of a single click).

But I'm not sure where to put them at a minimal time investment on my part, to make them maximally useful to others. Hence my initial question.

Best,

  P.



On 02/11/16 17:34, Lou Burnard wrote:
You prefer a single page with 80 links you have to clicketty click to
check if they're any use to you? Chain son gout...

Sent from my Honor Mobile

-------- Original Message --------
Subject: Re: advantages of TEI
From: Piotr Bański
To: Lou Burnard ,[hidden email]
CC:

Hi Lou,

Of course I know the page, and of course I don't think it's very useful
for this kind of listing. Clickety-click through 80 sub-pages, not me... :-)

Cheers,

   P.

On 02/11/16 14:39, Lou Burnard wrote:
There is a tei projects page on the website which is where I would look
for this kind of information.

Sent from my Honor Mobile

-------- Original Message --------
Subject: Re: advantages of TEI
From: Piotr Bański
To: [hidden email]
CC:

A quick question to all: where in the TEI wiki structure would you
imagine this set of links?

@Paul: thanks for sharing! :-)

Best,

   Piotr

On 02/11/16 14:33, Magdalena Turska wrote:
Following on Paul's list, such project specific policies could be even
called 'cheatsheets' as in Marjorie
Burkhart's http://marjorie.burghart.online.fr/?q=en/content/tei-critical-apparatus-cheatsheet

Magdalena

On 2 November 2016 at 13:18, Paul Broyles <[hidden email]
<mailto:[hidden email]>> wrote:

    Eduard,

    Regarding the question of how flexibility and freedom complicate
    development, many projects develop internal encoding guidelines to
    provide consistency among their texts. Thus if you're developing a
    display system for the output of one particular project, you have a
    much more limited set of possibilities to aim for.

    These documents can also be useful to provide to encoders who are
    relative TEI novices, as they can help explain how TEI encoding
    works in a more beginner-friendly way than the official documentation.

    Here are a handful of examples I have handy and have found useful
    (the last is from a project I'm working on, though I had no hand in
    the guidelines):
    http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf
    <http://www.newtonproject.sussex.ac.uk/resources/pdfs/techspec.pdf>
    http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf
    <http://www.cdlib.org/groups/stwg/docs/MS_BPG.pdf>
    http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines
    <http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines>
    http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html
    <http://piers.iath.virginia.edu/resources/transcriptionalProtocols.html>

    There are many more examples out there, though unfortunately they
    can be hard to search for as they go under varied names, from
    "Transcriptional Protocols" to "Encoding Guidelines" to "Technical
    Introduction."

    Paul

    On Wed, Nov 2, 2016 at 7:56 AM Eduard Drenth
    <[hidden email] <mailto:[hidden email]>> wrote:

        Dear all,


        At the moment we are converting some 8000 corpora (runes, old,
        mid and new frisian, dialects) at the fryske akademy to TEI.


        By adopting an international standard we hope to boost doing
        research on top of these data. *Related to this I am interested
        in experiences with:*


        1) doing research on TEI material

        2) disclosure and publication (i.g. web) of TEI material


        Experiments with oxGarage http://www.tei-c.org/oxgarage/ shows
        minimal, disappointing results.


        I did not yet try https://github.com/TEIC/Stylesheets
        <https://github.com/TEIC/Stylesheets>, perhaps this offers
        better results.


        *I wonder how much effort / development (i.a. xslt, xquery) the
        two aspects (publicaton and doing research) will take.*


        TEI offers flexibility and freedom (i.g. <span type="lemma"
        target="w1 w2"> instead of <lemma target="w1 w2">) that
        complicates tool development. How big of a problem is this?



        Eduard Drenth, Software Architekt


        [hidden email] <mailto:[hidden email]>


        Doelestrjitte 8

        8911 DX  Ljouwert

        <a href="tel:%2B31%2058%20213%2014%2014" value="&#43;31582131414" target="_blank"> +31 58 213 14 14

        chat: [hidden email] <mailto:[hidden email]>

    --
    Paul A. Broyles, Ph.D.
    CLIR Postdoctoral Fellow in Data Curation for Medieval Studies
    North Carolina State University
    [hidden email] <mailto:[hidden email]>

    All electronic mail messages in connection with State business which
    are sent to or received by this account are subject to the NC Public
    Records Law and may be disclosed to third parties.





--
Elisa Beshero-Bondar, PhD
Director, Center for the Digital Text | Associate Professor of English
University of Pittsburgh at Greensburg | Humanities Division
150 Finoli Drive
Greensburg, PA  15601  USA
E-mail:[hidden email]
Development site: http://newtfire.org
12