sporadic information

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

sporadic information

Michael Sperberg-McQueen
Lou's problem of translating sporadically recorded information into TEI
form will surely, as David Chesnutt has already observed, be a very
common and important one.  I'm with Frank Tompa on this one, though:
the TEI scheme in its current form already provides methods for (a)
marking place names and (b) providing a linguistic analysis or
categorization of a word or phrase.  So I don't at all see what the
problem is:  when the source text marks 'Bath' as a place name, tag it
as a place name, and when the source text provides word-class
information for a word, tag it in the usual way.

Lou suggests that this would be inelegant or misleading, since not all
place names are so tagged, and not all words are classified.

But consider the alternatives:

    1 lose the information
    2 leave the information in its non-TEI form
    3 complete the tagging (ie tag all the rest of the place names,
         and give part of speech and sense number for all words),
         so the tagging is consistent and complete, and then use
         the existing TEI tags
    4 use the existing TEI tags, and note in the header that not
         all words are classed, not all place names are tagged, ...
    5 invent new TEI-style tags, and continue to note in the header
         that not all words are classed, and not all place names
         tagged

Of these, 1 is a bad idea. 2 is occasionally tempting, especially for
things one doesn't know how to handle in TEI tagging, but it really just
means engaging in only a partial conversion to TEI markup.  Particularly
when the information left unconverted *does* have a TEI form, such texts
should not be regarded as TEI conformant.  3 is a pipedream in most
cases, and violates the spirit of TEI's role as a format for interchange
of texts without incurring information loss or requiring information
enrichment.

The only difference I see between 4 (which LB was uncomfortable with,
and which prompted his inquiry) and 5 (which he suggests as a solution)
is that the one uses the existing tags, and the other doesn't.   I don't
see that as a big advantage for choice 5, myself.  Why on earth do we
want to distinguish between the concepts PLACENAME (for which we have a
tag) and PLACENAME-TAGGED-EVEN-THOUGH-OTHER-PLACENAMES-ARE-NOT-TAGGED
(for which we don't, yet, though it would be a legal tag name).

Perhaps a fuller description is needed in the header to allow users to
specify how consistently and how thoroughly various tags (particularly
for text enrichment) have been used.  But not new tags for the same old
information.

Wearing no hat at all except a woolen cap to keep out Chicago's wind,

Michael Sperberg-McQueen

Reply | Threaded
Open this post in threaded view
|

Re: sporadic information

koontz
Perhaps TEI needs to add for some tags corresponding tags to indicate that
tags of the first sort are not being used consistently, for example, for
placename tags a corresponding tag indicating that placename tags are not
used consistently.  (This may sound a bit as if I am poking fun, but I am
not.)