Taxonomy Usage Question

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Taxonomy Usage Question

Michael Joyce
Hello.

I have a collection of 800+ documents which have been marked up in TEI-Lite[1]. They’re mostly relatively small newspaper articles, just a few paragraphs. The collection could eventually expand into the thousands.

We need to classify or categorize the documents. I’m pretty new to TEI, so I’m not sure what the best way is.

My first thought is to create a new categories.xml document which contains a classDecl with several taxonomies (one for location, one for subject matter, etc) each with their own set of categories.

I think I can link back to the categories with the catRef element, but the examples in the guidelines all show the categories and content in the same XML document.

Can I set up a single categories.xml document, and link to it in the catRef element? What syntax would I use?

Cheers,

Michael

1: http://www.tei-c.org/Guidelines/Customization/Lite/
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Sebastian Rahtz-3
>
> Can I set up a single categories.xml document, and link to it in the catRef element? What syntax would I use?
>
>

good idea

<catRef target=“categories.xml#whatever”/>


Sebastian Rahtz      
Chief Data Architect
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Michael Joyce
Thank you Sebastian. I didn’t think it would be that easy!

Michael

On May 11, 2015, at 10:15 AM, Sebastian Rahtz <[hidden email]> wrote:

>
>>
>> Can I set up a single categories.xml document, and link to it in the catRef element? What syntax would I use?
>>
>>
>
> good idea
>
> <catRef target=“categories.xml#whatever”/>
>
>
> Sebastian Rahtz      
> Chief Data Architect
> University of Oxford IT Services
> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Sebastian Rahtz-3
> On 11 May 2015, at 18:19, Michael Joyce <[hidden email]> wrote:
>
> Thank you Sebastian. I didn’t think it would be that easy!
>
Sebastian’s rule of thumb:

 * working out the “right" TEI markup is easy
 * applying the markup consistently is fairly hard
 * extracting information from the markup afterwards is very hard



Sebastian Rahtz      
Chief Data Architect
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Ondine LeBlanc
I like this set of rules. I think I'll print that out and keep it close by at all times.

-----Original Message-----
From: TEI (Text Encoding Initiative) public discussion list [mailto:[hidden email]] On Behalf Of Sebastian Rahtz
Sent: Monday, May 11, 2015 1:54 PM
To: [hidden email]
Subject: Re: Taxonomy Usage Question

> On 11 May 2015, at 18:19, Michael Joyce <[hidden email]> wrote:
>
> Thank you Sebastian. I didn't think it would be that easy!
>
Sebastian's rule of thumb:

 * working out the "right" TEI markup is easy
 * applying the markup consistently is fairly hard
 * extracting information from the markup afterwards is very hard



Sebastian Rahtz
Chief Data Architect
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
--
Ondine LeBlanc, Director of Publications
Massachusetts Historical Society
1154 Boylston Street, Boston, MA 02215
Tel: 617-646-0524, Fax: 617-859-0074
www.masshist.org - America's First Historical Society - Founded 1791

God Save the People! From the Stamp Act to Bunker Hill is on display Monday through Saturday from 10 AM to 4 PM through 4 September. More information is available at www.masshist.org.
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Fabio Ciotti-3
In reply to this post by Sebastian Rahtz-3
 * working out the “right" TEI markup is easy
 * applying the markup consistently is fairly hard
 * extracting information from the markup afterwards is very hard

Somehow, rule 3 shows rule 1 in a bad light...

f
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Martin Holmes
In reply to this post by Sebastian Rahtz-3
HI Michael,

We use private URI schemes for this. You can see an example here:

<http://mapoflondon.uvic.ca/ABCH1.xml>

which has:

<textClass>
   <catRef scheme="mdt:molDocumentTypes" target="mdt:mdtBornDigital"/>
   <catRef scheme="mdt:molDocumentTypes"
target="mdt:mdtEncyclopediaLocationStreet"/>
</textClass>

which is dereferenced through this prefixDef:

  <prefixDef ident="mdt" matchPattern="(.*)"
replacementPattern="http://mapoflondon.uvic.ca/includes.xml#$1">
           <p>The mdt (MoEML Document Type) prefix used on
<gi>catRef</gi>/<att>target</att> points
             to a central taxonomy in the includes file.</p>
         </prefixDef>

and points to a <taxonomy> in includes.xml, containing e.g.:

<category xml:id="mdtBornDigital" n="Born-digital documents">
             <catDesc>Born-digital documents created as part of this
project, and not based on any
               pre-existing source text. </catDesc>
           </category>

Cheers,
Martin



On 15-05-11 10:15 AM, Sebastian Rahtz wrote:

>>
>> Can I set up a single categories.xml document, and link to it in the catRef element? What syntax would I use?
>>
>>
>
> good idea
>
> <catRef target=“categories.xml#whatever”/>
>
>
> Sebastian Rahtz
> Chief Data Architect
> University of Oxford IT Services
> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Sebastian Rahtz-3
In reply to this post by Fabio Ciotti-3

> On 11 May 2015, at 19:29, Fabio Ciotti <[hidden email]> wrote:
>
>  * working out the “right" TEI markup is easy
>  * applying the markup consistently is fairly hard
>  * extracting information from the markup afterwards is very hard
>
> Somehow, rule 3 shows rule 1 in a bad light...

that tension is the very essence of the TEI

Sebastian Rahtz      
Chief Data Architect
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431





Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Fabio Ciotti-3

>  * working out the “right" TEI markup is easy
>  * applying the markup consistently is fairly hard
>  * extracting information from the markup afterwards is very hard
>
> Somehow, rule 3 shows rule 1 in a bad light...

that tension is the very essence of the TEI


Ok, with this last axiom you have finally defined "what TEI really is"
To be carved in stone and taken as exergo in the next version of TEI-C website!

Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Martin Holmes
In reply to this post by Sebastian Rahtz-3
On 15-05-11 11:34 AM, Sebastian Rahtz wrote:

>
>> On 11 May 2015, at 19:29, Fabio Ciotti <[hidden email]> wrote:
>>
>>   * working out the “right" TEI markup is easy
>>   * applying the markup consistently is fairly hard
>>   * extracting information from the markup afterwards is very hard
>>
>> Somehow, rule 3 shows rule 1 in a bad light...
>
> that tension is the very essence of the TEI

I don't agree with this. Why do you find it hard to get data out of
markup? XSLT and XQuery are designed to do exactly that.

I think #1 is actually the hardest bit, especially in areas where
multiple approaches are supported by the Guidelines.

Cheers,
Martin
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Sebastian Rahtz-3

>
> I don't agree with this. Why do you find it hard to get data out of markup? XSLT and XQuery are designed to do exactly that.
>
XSLT and XQuery are good at implementing the solution, if you can work out what the solution is.

> I think #1 is actually the hardest bit, especially in areas where multiple approaches are supported by the Guidelines.
>
ok, lets try

  * working out a “right" TEI markup is easy

:-}

Sebastian Rahtz      
Chief Data Architect
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431





Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Hugh Cayless-2
In reply to this post by Martin Holmes
Yeah, I agree with you, Martin, unless Sebastian means that it’s very hard to do #3 if you haven’t done #1 and #2 well. Then, yes, most definitely.


> On May 11, 2015, at 14:50 , Martin Holmes <[hidden email]> wrote:
>
> On 15-05-11 11:34 AM, Sebastian Rahtz wrote:
>>
>>> On 11 May 2015, at 19:29, Fabio Ciotti <[hidden email]> wrote:
>>>
>>>  * working out the “right" TEI markup is easy
>>>  * applying the markup consistently is fairly hard
>>>  * extracting information from the markup afterwards is very hard
>>>
>>> Somehow, rule 3 shows rule 1 in a bad light...
>>
>> that tension is the very essence of the TEI
>
> I don't agree with this. Why do you find it hard to get data out of markup? XSLT and XQuery are designed to do exactly that.
>
> I think #1 is actually the hardest bit, especially in areas where multiple approaches are supported by the Guidelines.
>
> Cheers,
> Martin
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Lou Burnard-6
In reply to this post by Michael Joyce
Or you could use xInclude of course.

(in the header, wherever you think the taxonomy should be defined)

<xi:include url="categories.xml"/>

(in the text)

<catRef target="#whatever"/>

This has the advantage that you can maintain the categories.xml
separately, without having to update all your <catRef>s if it changes
its location someday

On 11/05/15 18:19, Michael Joyce wrote:

> Thank you Sebastian. I didn’t think it would be that easy!
>
> Michael
>
> On May 11, 2015, at 10:15 AM, Sebastian Rahtz <[hidden email]> wrote:
>
>>> Can I set up a single categories.xml document, and link to it in the catRef element? What syntax would I use?
>>>
>>>
>> good idea
>>
>> <catRef target=“categories.xml#whatever”/>
>>
>>
>> Sebastian Rahtz
>> Chief Data Architect
>> University of Oxford IT Services
>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>>
>>
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Lou Burnard-6
In reply to this post by Martin Holmes
You won't be surprised to hear that I definitely agree with Martin on
this one. If you do the markup right, by definition processing it is
easier, because the markup is right.

Consistency however is the very devil.


On 11/05/15 19:50, Martin Holmes wrote:

> On 15-05-11 11:34 AM, Sebastian Rahtz wrote:
>>
>>> On 11 May 2015, at 19:29, Fabio Ciotti <[hidden email]>
>>> wrote:
>>>
>>>   * working out the “right" TEI markup is easy
>>>   * applying the markup consistently is fairly hard
>>>   * extracting information from the markup afterwards is very hard
>>>
>>> Somehow, rule 3 shows rule 1 in a bad light...
>>
>> that tension is the very essence of the TEI
>
> I don't agree with this. Why do you find it hard to get data out of
> markup? XSLT and XQuery are designed to do exactly that.
>
> I think #1 is actually the hardest bit, especially in areas where
> multiple approaches are supported by the Guidelines.
>
> Cheers,
> Martin
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy Usage Question

Stuart A. Yeates
In reply to this post by Sebastian Rahtz-3
Can I suggest that your transition to a future world of linked data
will be less painful if you use absolute URLs rather than relative
ones:

<catRef target=“http://example.org/categories.xml#whatever”/>

cheers
stuart
--
...let us be heard from red core to black sky


On Tue, May 12, 2015 at 5:15 AM, Sebastian Rahtz
<[hidden email]> wrote:

>>
>> Can I set up a single categories.xml document, and link to it in the catRef element? What syntax would I use?
>>
>>
>
> good idea
>
> <catRef target=“categories.xml#whatever”/>
>
>
> Sebastian Rahtz
> Chief Data Architect
> University of Oxford IT Services
> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431