Feature Structures remediation?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Feature Structures remediation?

Hayim Lapin
I am writing for advice, on or off list, about revising a feature structure system for morphological markup. I have a bunch of questions (including the structure of a stand-alone features document), but for the moment, my I want to ask about the role of fvLib and fLib. Below are two sets of examples, one from the ISO spec for MAF [1] illustrating how this could be done in compliance with TEI; the other is from a recent article in jTEI.

On the model in Example 1, the fvLib seems to create a list of name/atomic value pairs that is then reused. The feature library itself only lists values relevant to (French) nouns, but is a generic enough listing that it has masc and fem, singular and plural. From an example that precedes this, it is the fLib values (i.e., #pos.n, not #noun) in the analysis. Why is this not unnecessary duplication? Does fvLib give "available" values, and fLib give their "applied" use?

In Example 2, the model seems to be the reverse: fLib creates a list of atomic values and fvLib possible morphological combinations, that are then reused on @ana in <form>. Am I missing something here? Is Example 2 just creating the name-symbol pairs in the feature definitions themselves and skipping the step of a fvLib?

Featureless, near Washington DC,
HL


Example 1
<fvLib n="French morpho values"> 
<symbol xml:id="noun" value="noun"/>
<symbol xml:id="sing" value="singular"/> 
<symbol xml:id="plu" value="plural"/> 
<symbol xml:id="masc" value="masculine"/> 
<symbol xml:id="fem" value="feminine"/> 
</fvLib>
<fLib> 
<f xml:id="pos.n" name="pos" fVal="#noun"/> 
<f xml:id="num.sg" name="number" fVal="#sing"/> 
<f xml:id="num.p" name="number" fVal="#plu"/> 
<f xml:id="gen.f" name="gender" fVal="#fem"/> 
<f xml:id="gen.m" name="gender" fVal="#masc"/> 
</fLib>

Example 2
<fvLib>
   ...
   <fs xml:id="v_pres_ind_sg_p2" name="v_pres_ind_sg_p2"
                 feats="#pos.verb #tns.pres #mood.ind #num.pl #pers.2">
   ...
</fvLib>
<fLib>
   <f xml:id="pos.verb" name="pos"><symbol value="verb"/></f>
   ...
   <f xml:id="tns.pres" name="tense"><symbol value="present"/></f>
   ...
   <f xml:id="mood.ind" name="mood"><symbol value="indicative"/></f>
   ...
   <f xml:id="num.pl" name="number"><symbol value="plural"/></f>
   ...
   <f xml:id="pers.2" name="person"><symbol value="2nd"/></f>
   ...
</fLib>
<!-- elsewhere -->




...
   <form type="inflected" ana="#v_pres_ind_pl_p1 #v_pres_ind_pl_p3 ">
      <orth>gehen</orth>
   </form>
...


[1] ISO TC37/SC4–related MAF (Morphosyntactic Annotation Framework)
[2] http://journals.openedition.org/jtei/522#tocto3n2 at paragraph 41

Robert H. Smith Professor of Jewish Studies and
Professor of History
Department of History
University of Maryland
2115 Francis Scott Key Hall
College Park, MD 20742
301 405 4296 | [hidden email]
www.digitalmishnah.org | www.eRabbinica.org

Director
Joseph and Rebecca Meyehoff
Program and Center forJewish Studies
University of Maryland
4141 Susquehanna Hall
College Park, MD 20742
301 405 4975 | [hidden email]
www.jewishstudies.umd.edu

Reply | Threaded
Open this post in threaded view
|

Re: Feature Structures remediation?

Piotr Bański-2

Dear Hayim,

From what I can see in your example, one could think of fvLib and fLib roughly as you suggested, with the former containing e.g. values defined by the grammatical system of the given language, and the latter being basically shortcuts. One could also imagine fvLib being logically composite, containing complex feature values defined by referencing multiple atomic values (contained in other, simpler, fvLibs). This is not only heavily language-dependent but also theory- and application-dependent. (Or maybe instead of "application", I should say "use case", as in when someone decides to encode the full extent of a feature-geometric, hierarchical system, separating the atomic values, then encoding their possible compositions, and so on, and then linking them together with their "containers" in an fLib -- it's doable, given enough years and sometimes some scripting language, but not a task for the weak of heart, unless the system to be described is ultra-simple). And one could devilishly also throw DCR attributes (att.datcat) into the mix, for the purpose anchoring labels such as "noun" or "Subst" in a single node of some agreed attribute|value ontology/taxonomy.

In the examples you cite, it looks like what's at stake is compactness of the <fs>, especially in the example that uses @feats:

<fs xml:id="v_pres_ind_sg_p2" name="v_pres_ind_sg_p2"
                 feats="#pos.verb #tns.pres #mood.ind #num.pl #pers.2">

 -- I'm trying not to imagine the full-blown representation of this on the one hand, but on the other, such an approach can also be useful when coupled with a closed-world assumption that only the values listed in this way are legitimate combinations of POS+morphosyntactic features for the given language.

Does this help in any way? My point is also to say that it is conceivable that for your use case and your assumed model of grammar, mimicking this solution in full might legitimately be considered overkill -- maybe that's what made you wonder.

Best regards,

   Piotr


On 02/12/18 20:15, Hayim Lapin wrote:
I am writing for advice, on or off list, about revising a feature structure system for morphological markup. I have a bunch of questions (including the structure of a stand-alone features document), but for the moment, my I want to ask about the role of fvLib and fLib. Below are two sets of examples, one from the ISO spec for MAF [1] illustrating how this could be done in compliance with TEI; the other is from a recent article in jTEI.

On the model in Example 1, the fvLib seems to create a list of name/atomic value pairs that is then reused. The feature library itself only lists values relevant to (French) nouns, but is a generic enough listing that it has masc and fem, singular and plural. From an example that precedes this, it is the fLib values (i.e., #pos.n, not #noun) in the analysis. Why is this not unnecessary duplication? Does fvLib give "available" values, and fLib give their "applied" use?

In Example 2, the model seems to be the reverse: fLib creates a list of atomic values and fvLib possible morphological combinations, that are then reused on @ana in <form>. Am I missing something here? Is Example 2 just creating the name-symbol pairs in the feature definitions themselves and skipping the step of a fvLib?

Featureless, near Washington DC,
HL


Example 1
<fvLib n="French morpho values"> 
<symbol xml:id="noun" value="noun"/>
<symbol xml:id="sing" value="singular"/> 
<symbol xml:id="plu" value="plural"/> 
<symbol xml:id="masc" value="masculine"/> 
<symbol xml:id="fem" value="feminine"/> 
</fvLib>
<fLib> 
<f xml:id="pos.n" name="pos" fVal="#noun"/> 
<f xml:id="num.sg" name="number" fVal="#sing"/> 
<f xml:id="num.p" name="number" fVal="#plu"/> 
<f xml:id="gen.f" name="gender" fVal="#fem"/> 
<f xml:id="gen.m" name="gender" fVal="#masc"/> 
</fLib>

Example 2
<fvLib>
   ...
   <fs xml:id="v_pres_ind_sg_p2" name="v_pres_ind_sg_p2"
                 feats="#pos.verb #tns.pres #mood.ind #num.pl #pers.2">
   ...
</fvLib>
<fLib>
   <f xml:id="pos.verb" name="pos"><symbol value="verb"/></f>
   ...
   <f xml:id="tns.pres" name="tense"><symbol value="present"/></f>
   ...
   <f xml:id="mood.ind" name="mood"><symbol value="indicative"/></f>
   ...
   <f xml:id="num.pl" name="number"><symbol value="plural"/></f>
   ...
   <f xml:id="pers.2" name="person"><symbol value="2nd"/></f>
   ...
</fLib>
<!-- elsewhere -->
...
   <form type="inflected" ana="#v_pres_ind_pl_p1 #v_pres_ind_pl_p3 ">
      <orth>gehen</orth>
   </form>
...


[1] ISO TC37/SC4–related MAF (Morphosyntactic Annotation Framework)
[2] http://journals.openedition.org/jtei/522#tocto3n2 at paragraph 41

Robert H. Smith Professor of Jewish Studies and
Professor of History
Department of History
University of Maryland
2115 Francis Scott Key Hall
College Park, MD 20742
301 405 4296 | [hidden email]
www.digitalmishnah.org | www.eRabbinica.org

Director
Joseph and Rebecca Meyehoff
Program and Center forJewish Studies
University of Maryland
4141 Susquehanna Hall
College Park, MD 20742
301 405 4975 | [hidden email]
www.jewishstudies.umd.edu


-- 
Piotr Bański, Ph.D.
Senior Researcher,
Institut für Deutsche Sprache,
R5 6-13
68-161 Mannheim, Germany