TEI Dictionaries - information about forms

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

TEI Dictionaries - information about forms

Jonathan Robie
I have several lexicons in which <orth> contains more than the lexeme itself, e.g.

         <orth>Ἀαρών, ὁ</orth>


The lexeme is Ἀαρών, it is masculine so it takes the article ὁ.  Is that the right way to use <orth>?  It seems to conflate two concerns.  What is the best way to encode this information?


Thanks!


Jonathan

Reply | Threaded
Open this post in threaded view
|

Re: TEI Dictionaries - information about forms

Piotr Bański-2
Hello Jonathan,

in the TEI-Lex0 taskforce that is in the process of formulating
streamlined baseline recommendations for dictionary encoding, we have so
far arrived at something like the following, for your case:

<entry [attributes, among them @xml:id, and xml:lang here or on the
<text> element above]>
   <form type="lemma"><orth>Ἀαρών</orth></form>
   <gramGrp><pos>ὁ</pos></gramGrp>
   <sense>...
...

I have treated ὁ above as a symbol, rather than an orthographic form,
because this is the role that it plays in the entry. Variations depend
on the entire system that you assume; for example, you could do:

<pos ana="#gender_m">ὁ</pos>

if you used a separately described taxonomy of grammatical features.

Another question is how badly you need the comma in the visualisation of
your dictionary. It could be added by means of styling, on the way to
the display.

HTH,

   Piotr

On 03/11/17 00:13, Jonathan Robie wrote:

> I have several lexicons in which <orth> contains more than the lexeme
> itself, e.g.
>
>          <orth>Ἀαρών, ὁ</orth>
>
>
> The lexeme is Ἀαρών, it is masculine so it takes the article ὁ.  Is that
> the right way to use <orth>?  It seems to conflate two concerns.  What
> is the best way to encode this information?
>
>
> Thanks!
>
>
> Jonathan
>

--
Piotr Bański, Ph.D.
Senior Researcher,
Institut für Deutsche Sprache,
R5 6-13
68-161 Mannheim, Germany
Reply | Threaded
Open this post in threaded view
|

Re: TEI Dictionaries - information about forms

Laurent Romary
Hi all,
In http://jtei.revues.org/540, Werner Wegstein and I suggested something like the following for a similar case:
<form type="lemma">
 <gramGrp>
   <gen norm="feminine">die</gen>
 </gramGrp>
 <orth>Katze</orth>
</form>

As to the coma, you can have an additional <pc>, </pc>, which can occur in <gramGrp> or <form>.
@Piotr: these are cases where <gramGrp> in <form> would make more sense, would not they? Maybe we should record such configurations in TEI-Lex0
Laurent

Le 11 mars 2017 à 00:56, Piotr Bański <[hidden email]> a écrit :

Hello Jonathan,

in the TEI-Lex0 taskforce that is in the process of formulating streamlined baseline recommendations for dictionary encoding, we have so far arrived at something like the following, for your case:

<entry [attributes, among them @xml:id, and xml:lang here or on the <text> element above]>
 <form type="lemma"><orth>Ἀαρών</orth></form>
 <gramGrp><pos>ὁ</pos></gramGrp>
 <sense>...
...

I have treated ὁ above as a symbol, rather than an orthographic form, because this is the role that it plays in the entry. Variations depend on the entire system that you assume; for example, you could do:

<pos ana="#gender_m">ὁ</pos>

if you used a separately described taxonomy of grammatical features.

Another question is how badly you need the comma in the visualisation of your dictionary. It could be added by means of styling, on the way to the display.

HTH,

 Piotr

On 03/11/17 00:13, Jonathan Robie wrote:
I have several lexicons in which <orth> contains more than the lexeme
itself, e.g.

        <orth>Ἀαρών, ὁ</orth>


The lexeme is Ἀαρών, it is masculine so it takes the article ὁ.  Is that
the right way to use <orth>?  It seems to conflate two concerns.  What
is the best way to encode this information?


Thanks!


Jonathan


--
Piotr Bański, Ph.D.
Senior Researcher,
Institut für Deutsche Sprache,
R5 6-13
68-161 Mannheim, Germany

Laurent Romary
Inria, team Alpage





Reply | Threaded
Open this post in threaded view
|

Re: TEI Dictionaries - information about forms

Piotr Banski
Hi Laurent,

On 03/11/17 06:19, Laurent Romary wrote:
[..]
> @Piotr: these are cases where <gramGrp> in <form> would make more sense,
> would not they? Maybe we should record such configurations in TEI-Lex0

No, exactly not. We've been through this, and the decision was clear to
all, it seemed. :-)

The masculine gender is the property of the entry, rather than just the
lemma.

Best,

   Piotr