att.lexicographic.normalized

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

att.lexicographic.normalized

Maarten Janssen
Dear list,

I have two question concerning the “new” att.lexicographic.normalized - basically concerning the two parts of the name:

a - why is is labelled as lexicographic? as far as I can see, also from the examples, it is an orthographic normalization, that can or cannot be taken from a dictionary citation form - dictionaries always have a “normalized” form (although they might have more than one - it is far from uncommon to have multiple citation forms for a single entry), but there are documents in languages without dictionaries, or corpora that predate dictionaries, or editors that simply have different views on how to normalize - so dicionaries have to deal with normalization, but there are many other ways to normalize. So why lexicographic?

b- why is is @norm and not @reg? There is a motivation behind the fact that in <choice>, it is called a regularization and not a normalization, and I would say that that exact same motivation would apply here - so it seems incoherent to have a @norm on elements like <w> while having a <reg> inside <choice> while they do largely the same thing.

I might be missing motivations here, but can anyone enlighten me?

Maarten
Reply | Threaded
Open this post in threaded view
|

Re: att.lexicographic.normalized

Laurent Romary
Dear Maarten,
The answer to your two questions are the same. It all come from the dictionary module and some of us identified that the attributes that are declared under att.lexicographic.normalized could also be useful elsewhere. If you thus look at https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.lexicographic.normalized.html, you’ll see that dictionary elements still have a string majority. But indeed, why not extend more?
Je vous souhaite à tous, membres de cette extraordinaire communauté que forme la TEI, de joyeuses fêtes de fin d’année,
Laurent


Le 23 déc. 2020 à 16:32, Maarten Janssen <[hidden email]> a écrit :

Dear list,

I have two question concerning the “new” att.lexicographic.normalized - basically concerning the two parts of the name:

a - why is is labelled as lexicographic? as far as I can see, also from the examples, it is an orthographic normalization, that can or cannot be taken from a dictionary citation form - dictionaries always have a “normalized” form (although they might have more than one - it is far from uncommon to have multiple citation forms for a single entry), but there are documents in languages without dictionaries, or corpora that predate dictionaries, or editors that simply have different views on how to normalize - so dicionaries have to deal with normalization, but there are many other ways to normalize. So why lexicographic?

b- why is is @norm and not @reg? There is a motivation behind the fact that in <choice>, it is called a regularization and not a normalization, and I would say that that exact same motivation would apply here - so it seems incoherent to have a @norm on elements like <w> while having a <reg> inside <choice> while they do largely the same thing.

I might be missing motivations here, but can anyone enlighten me?

Maarten

Laurent Romary
Inria, team ALMAnaCH