Re: comments by Peter Robinson on encodings for textual variation
> It seems a little odd to me that we mark explicitly the beginning and > end of the lemma in the base text, but we do not mark it explicitly in > the variant. Of course, when one is looking at the variant only within > the apparatus this does not matter: the whole variant is given, placed > beside the lemma, so the beginning and end of the variant declare > themselves. But one can imagine many circumstances where one is not > looking at the variant within the apparatus. For example, one might be > reading through the variant source itself, rather than just reading > bits of it decomposed through an apparatus. If one marked the beginning > and end of the variant text, as well as the lemma, those markers could > then be read back into the variant source, and could then be used to > "look up" the parallel text in the master, or in some other text. Whether "odd" or not, I can think of several reasons why one might not wish to require explicit marking of endpoints with IDs (<anchor>) for variant(s), as with the lemma. Here are three: (1) The encoder presumably has access to machine-readable copy of the base text being annotated for textual variation, but machine-readable editions for hundreds or thousands of other relevant witnesses may NOT be in electronic form. In the case of biblical, cuneiform and other oriental texts, I suspect this will be the norm for decades to come. The tagger may wish to encode variants from other editions, having no alternative but to cite variants known only from hard-copy sources, and then not with SGML IDs. Often the textual data in these sources will be sporadic, occasional or otherwise incomplete: UN-contextualized word- or phrase-level variants recorded by earlier textual critics (e.g., Hexaplaric readings in biblical studies; occasional variant readings cited from unpublished tablets in the Chicago Assyrian Dictionary volumes or Akkadisches Handwoerterbuch). Mechanisms are suggested in the TEI Guidelines for using regular-expression-like notation for locating a cited text string in its hardcopy (brick/clay tablet/papyrus/paper) source if the encoder has access to that hardcopy source in full, and if there is necessity for a robust mapping scheme. See Guidelines section 5.7, esp. 5.7.3. But the notation "<anchor id=xx>" would not be appropriate as such, and similar mechanisms suggested in 5.7.3 may entail encoding labor far exceeding the tagger's interest/patience. Rather than using SGML IDs, explicit locators would have to make use of the hardcopy source's native or canonical referencing scheme down to the level of a parsable range, intended for humans initially, but if done well, for conversion by machines at some later time when the hardcopy source is digitized. But this provision will still not assist the encoder in making use of double-endpoint attachment in cases when isolated single-word/phrase variants are known from casual/incomplete and sometimes un(der)documented sources which themselves do not merit representation using a section 5.7.3 mechanism. (2) A second common case when double-endpoint attachment is less meaningful (or an unnecessary encumbrance) is when a textual variant has been normalized or retroverted, say, from another language. Suppose we have a Hebrew source text with a phrase-level variant (involving grammatical co-dependencies -- as on the simple example of p. 119)., and suppose we have Greek and German translation texts containing reflexes of the various Hebrew readings. The translated phrase may map out very awkwardly over the Greek/German sentence, so in practice, the top-level apparatus entry for the Hebrew text will be normalized to Hebrew, and the Greek/German witnesses cited by siglum in support of the Hebrew alternatives. Linguists, and some text-critical researchers might indeed wish to study the details of these theoretical Hebrew-German/Greek mappings. But it could easily overburden the average encoder (in this case, of the Hebrew text) to require that multiple anchors be set in the variant texts, and that startpointS (plural&emph;) and endpointS of the Greek and German discontinuous text segments be mapped correctly to the elements of the Hebrew phrase: simple normalization and witness-list (omitting any variant endpoints) would be sufficient. If it is desirable to supply double-endpoint attachment notation for variant readings, perhaps this can be done under <witness-detail> (Guidelines section 5.10.10). Use of the method outlined in 6.2.5, "explicit alignment of multiple analyses" (p. 142) would possibly work in some cases, but could get very nasty (as I imagine the process of creating the alignment) when one has hundreds or thousands of witnesses, in various languages. I think only encoders who wish specify this level of segmentation and alignment should be required to do so; it should not be expected as part of standard encoding. (3) Requiring double-endpoint attachment notation for variant readings might be a particularly aggrivating nuisance, and sometimes nonsense, when the variant text contains a zero variant. It's very easy to assert that witness C lacks the reading of the base text, but if witness C is given to irratic stylistic transpositions anyway (e.g., for adverbs), where would one (confidently) set the anchors for the null reading in the variant text? We don't usually think of a zero-variant on the base text, but if we did, the "double-endpoint" attachment would make less sense than single-point locus for a simple textual minus. "Minuses" and "plusses" make more sense once the encoding and qualtification is done. Do we propose that the encoding support machine-permutation of the data such that any text can become the "base" text, with all others "variant" texts? Requiring double-endpoint notation for all variant readings sounds like a manual step in this direction. Sounds nice, but difficult, and perhaps labor-intensive for the encoding phase. These questions expose some of the many difficulties in working from a base text against which all variants are registered -- which, however, is probably inescapable. Re: the three/four alternative systems: > This looks to me to be three systems too many. Double end-point > attachment might be all we need: Nobody would argue against simplicity, of course. The TEI editors could comment more authoritatively, but my memory is that alternative schemes were proposed: (a) to encourage discussion of the theoretical problems, (b) to accommodate text encoders in highly variable textual arenas. Someone encoding textual variation when only two exemplars are known might prefer "parallel segmentation" method because it's more economical or perspicuous at various levels (storage, browsing, parsing, etc.). Similarly, "inline" and "external" alternatives were proposed to accommodate different encoding goals, personal preferences, corpora of widely variable textual richness, etc. If indeed there's no need for single- endpoint attachment, then we should dispense with it. Does double-endpoint attachment make sense always (e.g., for simple plus in the variant text, could we use "<app point=a2>" rather than <app startpoint=a2 endpoint=a2>)? Zero variants (simple plusses and minuses) point out the privilege that the base text text usually receives and the desideratum of having variation expressible from neutral (database) persective or from the viewpoint of any single witness. Can this be had for free -- or at a price encoders are willing to pay if 95% of their variants data is only in hardcopy format? > I am suspicious of the system of "nesting" variants given in > the guidelines. I forget all the reasons advanced and discussed for nesting within <app> entries. Possible cases: (1) Nesting would be a valuable means of grouping large-scale recensional variants. Imagine two textual traditions that are genetically related, but clearly as recensional variants, sharing only 70% of content in common. Both major recensions have extensive textual variation in daughter traditions. Nesting can be used to describe oppositions that are meaningful at the lower levels. (2) Nesting may be a valuable means of grouping language witnesses in an encoding project that wishes to trace textual variation in several languages. Normalizations would be required at key points, but the researcher may indeed wish to fully encode all variation below a single parent language text in the same database. This is perhaps the only way to get control over complex traditions in which retroversion of the most recent (language) levels to the earliest does not make any historical sense (e.g., Hebrew 'Scriptures' 4th century BCE >> Old Greek translation(s), 3rd/2nd centuries BCE >> Sahidic Coptic translation of (Old) Greek, 3rd century CE). Of course, (1) and (2) are not mutually exclusive motivations for supporting nesting. A procedural question (I am perplexed by Lou's posting of PR's note on TEI-L): do we wish to inflict upon **everyone** on TEI-L the gory details of encoding textual variation? Are not the TEI-REP, TEI-ANA (etc.) listserv groups still active? There is (supposed to be) a TEI Textual Criticism Working Group -- would not a listserv for this WG be a better discussion forum for the most boring details of encoding textual variation? Robin Cover |
Robin Cover's very helpful recent discussion of encoding textual variation
concluded with the following procedural question: >...do we wish to inflict upon **everyone** on TEI-L the gory details >of encoding textual variation? Are not the TEI-REP, TEI-ANA (etc.) >listserv groups still active? There is (supposed to be) a TEI Textual >Criticism Working Group -- would not a listserv for this WG be a better >discussion forum for the most boring details of encoding textual >variation? During a quiet time on TEI-L, I asked the editors about subscribing to TEI-REP, etc. I was told that these ListServs were for internal discussion of work in progress that was not yet polished enough for public consumption. Since I was not a member of the relevant committees, I would not be permitted to subscribe. This seems quite reasonable; we all show drafts to selected colleagues before we solicit general comments and suggestions. It was explained that all relevant information would eventually be distributed on the public list. Material would be restricted to private lists not because it was "boring" (i.e., of limited interest to the general readership), but because it was unfinished. This being the case, TEI-L is the only publically accessible forum for discussion of any TEI issues, both general and specific. Relegating textual criticism discussions to a private ListServ would makes these discussions inaccessible to at least one interested party. I would prefer to receive as much information as possible and take personal responsibility for what I do and don't read and I appreciate the recent postings on these important issues to a publically accessible list. Two ways of dealing with specialized or extremely technical discussion might be the following: 1) Post all information to TEI-L, where readers can select messages they wish to read according to subject headings. 2) Use specialized lists for specialized discussions, but make these publically accessible. This can involve either opening all specialized lists to the general public or creating a separate set of lists for open discussion of specialized questions. This new set would differ from the old specialized lists in that information would be directed there because it was specialized and of limited general interest, rather than because it was too incompletely formed for public consumption. Each of these has its advantages. The point of this posting isn't to start a procedural wrangle, but to remind those with access to the specialized lists that redirecting discussions there would exclude interested readers. --David =================================================================== David J. Birnbaum [hidden email] [Internet] [hidden email] [Bitnet] |
Free forum by Nabble | Edit this page |