classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view


Robin C. Cover
Re: comments by Peter Robinson on encodings for textual variation

> It seems a little odd to me that we mark explicitly the beginning and
> end of the lemma in the base text, but we do not mark it explicitly in
> the variant. Of course, when one is looking at the variant only within
> the apparatus this does not matter: the whole variant is given, placed
> beside the lemma, so the beginning and end of the variant declare
> themselves. But one can imagine many circumstances where one is not
> looking at the variant within the apparatus. For example, one might be
> reading through the variant source itself, rather than just reading
> bits of it decomposed through an apparatus. If one marked the beginning
> and end of the variant text, as well as the lemma, those markers could
> then be read back into the variant source, and could then be used to
> "look up" the parallel text in the master, or in some other text.

Whether "odd" or not, I can think of several reasons why one might not
wish to require explicit marking of endpoints with IDs (<anchor>) for
variant(s), as with the lemma. Here are three:

(1) The encoder presumably has access to machine-readable copy of the base
text being annotated for textual variation, but machine-readable editions
for hundreds or thousands of other relevant witnesses may NOT be in
electronic form.  In the case of biblical, cuneiform and other oriental
texts, I suspect this will be the norm for decades to come.  The tagger
may wish to encode variants from other editions, having no alternative but
to cite variants known only from hard-copy sources, and then not with SGML
IDs.   Often the textual data in these sources will be sporadic,
occasional or otherwise incomplete: UN-contextualized word- or
phrase-level variants recorded by earlier textual critics (e.g.,
Hexaplaric readings in biblical studies; occasional variant readings cited
from unpublished tablets in the Chicago Assyrian Dictionary volumes or
Akkadisches Handwoerterbuch).  Mechanisms are suggested in the TEI
Guidelines for using regular-expression-like notation for locating a cited
text string in its hardcopy (brick/clay tablet/papyrus/paper) source if
the encoder has access to that hardcopy source in full, and if there is
necessity for a robust mapping scheme.  See Guidelines section 5.7, esp.
5.7.3.  But the notation "<anchor id=xx>" would not be appropriate as
such, and similar mechanisms suggested in 5.7.3 may entail encoding labor
far exceeding the tagger's interest/patience.  Rather than using SGML IDs,
explicit locators would have to make use of the hardcopy source's native
or canonical referencing scheme down to the level of a parsable range,
intended for humans initially, but if done well, for conversion by
machines at some later time when the hardcopy source is digitized.  But
this provision will still not assist the encoder in making use of
double-endpoint attachment in cases when isolated single-word/phrase
variants are known from casual/incomplete and sometimes un(der)documented
sources which themselves do not merit representation using a section 5.7.3

(2) A second common case when double-endpoint attachment is less
meaningful (or an unnecessary encumbrance) is when a textual variant has
been normalized or retroverted, say, from another language.  Suppose we
have a Hebrew source text with a phrase-level variant (involving
grammatical co-dependencies -- as on the simple example of p. 119)., and
suppose we have Greek and German translation texts containing reflexes of
the various Hebrew readings.  The translated phrase may map out very
awkwardly over the Greek/German sentence, so in practice, the top-level
apparatus entry for the Hebrew text will be normalized to Hebrew, and the
Greek/German witnesses cited by siglum in support of the Hebrew
alternatives.  Linguists, and some text-critical researchers might indeed
wish to study the details of these theoretical Hebrew-German/Greek
mappings.   But it could easily overburden the average encoder (in this
case, of the Hebrew text) to require that multiple anchors be set in the
variant texts, and that startpointS (plural&emph;) and endpointS of the
Greek and German discontinuous text segments be mapped correctly to the
elements of the Hebrew phrase: simple normalization and witness-list
(omitting any variant endpoints) would be sufficient.

If it is desirable to supply double-endpoint attachment notation for
variant readings, perhaps this can be done under <witness-detail>
(Guidelines section 5.10.10).  Use of the method outlined in 6.2.5,
"explicit alignment of multiple analyses" (p. 142) would possibly work in
some cases, but could get very nasty (as I imagine the process of creating
the alignment) when one has hundreds or thousands of witnesses, in various
languages. I think only encoders who wish specify this level of
segmentation and alignment should be required to do so; it should not be
expected as part of standard encoding.

(3) Requiring double-endpoint attachment notation for variant readings
might be a particularly aggrivating nuisance, and sometimes nonsense, when
the variant text contains a zero variant.  It's very easy to assert that
witness C lacks the reading of the base text, but if witness C is given to
irratic stylistic transpositions anyway (e.g., for adverbs), where would
one (confidently) set the anchors for the null reading in the variant
text?  We don't usually think of a zero-variant on the base text, but if
we did, the "double-endpoint" attachment would make less sense than
single-point locus for a simple textual minus.  "Minuses" and "plusses"
make more sense once the encoding and qualtification is done.  Do we
propose that the encoding support machine-permutation of the data such
that any text can become the "base" text, with all others "variant" texts?
Requiring double-endpoint notation for all variant readings sounds like a
manual step in this direction.  Sounds nice, but difficult, and perhaps
labor-intensive for the encoding phase.  These questions expose some of
the many difficulties in working from a base text against which all
variants are registered -- which, however, is probably inescapable.

Re: the three/four alternative systems:

> This looks to me to be three systems too many. Double end-point
> attachment might be all we need:

Nobody would argue against simplicity, of course.  The TEI editors could
comment more authoritatively, but my memory is that alternative schemes
were proposed: (a) to encourage discussion of the theoretical problems,
(b) to accommodate text encoders in highly variable textual arenas.
Someone encoding textual variation when only two exemplars are known might
prefer "parallel segmentation" method because it's more economical or
perspicuous at various levels (storage, browsing, parsing, etc.).
Similarly, "inline" and "external" alternatives were proposed to
accommodate different encoding goals, personal preferences, corpora of
widely variable textual richness, etc.  If indeed there's no need for
single- endpoint attachment, then we should dispense with it.  Does
double-endpoint attachment make sense always (e.g., for simple plus in the
variant text, could we use "<app point=a2>" rather than <app startpoint=a2
endpoint=a2>)?  Zero variants (simple plusses and minuses) point out the
privilege that the base text text usually receives and the desideratum of
having variation expressible from neutral (database) persective or from
the viewpoint of any single witness.  Can this be had for free -- or
at a price encoders are willing to pay if 95% of their variants data is
only in hardcopy format?

> I am suspicious of the system of "nesting" variants given in
> the guidelines.

I forget all the reasons advanced and discussed for nesting within <app>
entries.  Possible cases: (1) Nesting would be a valuable means of
grouping large-scale recensional variants.  Imagine two textual traditions
that are genetically related, but clearly as recensional variants, sharing
only 70% of content in common.  Both major recensions have extensive
textual variation in daughter traditions.  Nesting can be used to describe
oppositions that are meaningful at the lower levels.  (2) Nesting may be a
valuable means of grouping language witnesses in an encoding project that
wishes to trace textual variation in several languages.  Normalizations
would be required at key points, but the researcher may indeed wish to
fully encode all variation below a single parent language text in the same
database.  This is perhaps the only way to get control over complex
traditions in which retroversion of the most recent (language) levels to
the earliest does not make any historical sense (e.g., Hebrew 'Scriptures'
4th century BCE >> Old Greek translation(s), 3rd/2nd centuries BCE >>
Sahidic Coptic translation of (Old) Greek, 3rd century CE).  Of course,
(1) and (2) are not mutually exclusive motivations for supporting nesting.

A procedural question (I am perplexed by Lou's posting of PR's note on
TEI-L): do we wish to inflict upon **everyone** on TEI-L the gory details
of encoding textual variation?  Are not the TEI-REP, TEI-ANA (etc.)
listserv groups still active?  There is (supposed to be) a TEI Textual
Criticism Working Group -- would not a listserv for this WG be a better
discussion forum for the most boring details of encoding textual

Robin Cover

Reply | Threaded
Open this post in threaded view


Robin Cover's very helpful recent discussion of encoding textual variation
concluded with the following procedural question:

>...do we wish to inflict upon **everyone** on TEI-L the gory details
>of encoding textual variation?  Are not the TEI-REP, TEI-ANA (etc.)
>listserv groups still active?  There is (supposed to be) a TEI Textual
>Criticism Working Group -- would not a listserv for this WG be a better
>discussion forum for the most boring details of encoding textual

During a quiet time on TEI-L, I asked the editors about subscribing to TEI-REP,
etc.  I was told that these ListServs were for internal discussion of work in
progress that was not yet polished enough for public consumption.  Since I was
not a member of the relevant committees, I would not be permitted to subscribe.
This seems quite reasonable; we all show drafts to selected colleagues before
we  solicit general comments and suggestions.

It was explained that all relevant information would eventually be distributed
on the public list.  Material would be restricted to private lists not because
it was "boring" (i.e., of limited interest to the general readership), but
because it was unfinished.

This being the case, TEI-L is the only publically accessible forum for
discussion of any TEI issues, both general and specific.  Relegating textual
criticism discussions to a private ListServ would makes these discussions
inaccessible to at least one interested party.  I would prefer to receive as
much information as possible and take personal responsibility for what I do and
don't read and I appreciate the recent postings on these important issues to a
publically accessible list.

Two ways of dealing with specialized or extremely technical discussion might be
the following:

1) Post all information to TEI-L, where readers can select messages they wish
to read according to subject headings.

2) Use specialized lists for specialized discussions, but make these publically
accessible.  This can involve either opening all specialized lists to the
general public or creating a separate set of lists for open discussion of
specialized questions.  This new set would differ from the old specialized
lists in that information would be directed there because it was specialized
and of limited general interest, rather than because it was too incompletely
formed for public consumption.

Each of these has its advantages.  The point of this posting isn't to start a
procedural wrangle, but to remind those with access to the specialized lists
that redirecting discussions there would exclude interested readers.

David J. Birnbaum   [hidden email] [Internet]
                    [hidden email]   [Bitnet]