text-critical tagging

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

text-critical tagging

Michael Sperberg-McQueen
Many thanks to Peter Robinson and Robin Cover for their postings on
the thorny issues of textual criticism.  They have raised some questions
of fact which I may be able to answer, as well as a lot of questions of
policy, convenience, etc., which need general discussion (and to which
my two cents' worth follows).

First, on procedure:  should text-criticism be discussed on TEI-L?  I
believe so.  TEI-L is for public discussion of the TEI and its work, and
that must inevitably sometimes include some rather technical issues.
Those not interested in the particular technical questions being
discussed must simply be patient, and maybe get some practice with their
delete keys, just as those technically proficient in SGML have been very
patient during the discussions of basic issues and questions on this
list.   (This concludes the general-interest portion of this message.
If you are not interested in textual criticism, you should stop
reading right *now*.)

1.  The four methods.  I believe everyone agrees that four methods of
encoding textual variants is too many, and that we need eventually to
cut down to one or two methods.  The draft presents all four methods
(plus variations) that we could think of, because there was no consensus
as to their relative strengths.  I am grateful to Peter Robinson for
coming out so strongly in favor of one scheme -- this is the kind of
evaluation and judgement we need to make about text-critical tags in
this revision cycle.

2.  Peter Robinson is right to say double-attachment is the only method
*needed*, in the sense that it provides all the information needed for
an encoding using any of the other methods.  It isn't an argument for
double attachment, though, since it goes for all four methods:  they all
provide substantially the same information about the witnesses, and may
be regarded as notational variants of each other.  (The only exception
is that parallel segmentation and the other methods may treat
overlapping lemmata differently, making translation back and forth
tricky:  it can be done, but the results may look odd.)

3.  I don't understand what is meant by the suggestion to mark the end
points of the variation (not 'the lemma' surely!) 'in the variant'.
Where is the variant text?

Are we assuming the witness is held in electronic form in a separate
document or section?  If so, then I agree that marking the variations
would be useful.  The problem in this case is to mark points of
synchronization between/among parallel texts, and the techniques
described for parallel texts should prove useful and adequate.

If the witness is not held separately (a common case, I suspect, even
when the witnesses are fewer than a thousand), but the collation has
been thorough, then its readings should be reconstructible by a fairly
simple processing of the base text and apparatus, and the end-points of
the variations are marked implicitly by the APP and RDG tags.  In this
case, what more is needed?  (To answer Robin Cover's question, I for one
certainly expect any notation for apparatus to be able to support
mechanical reformulation of the apparatus using any arbitrarily chosen
witness or set of readings as the base -- given, of course, that the
collations are complete.  I regard this as a basic requirement.)

4.  The parallel-segmentation method really shines in this respect:
extraction of any given text is substantially simpler there than for any
of the other methods, since the beginning and ending of each variation
are explicitly marked as such.  The double-attachment method marks the
beginnings and endpoints explicitly, but since ANCHOR is a generic
position marker, not unique to apparatus entries, you can never know til
you've scanned the entire text for APP entries whether there is any
variation on a particular point in the base text.  Using parallel
segmentation, you always know when you enter a variation.  We could
clone ANCHOR to get an APPARATUS-START tag, but unless we require
exactly one such tag for each apparatus entry, we still won't match the
performance of parallel segmentation.

5.  I don't understand Peter Robinson's point about the tags for
alignment of multiple analyses.  I understand their relevance, but I
don't understand what PR was saying about them.

6.  Nesting variants.  It is common, at least in the text criticism I've
read, to work with groups of witnesses which agree in the basic line of
their reading, even when they have minor variations within the group.
Robin Cover has given some vivid examples of the advantages of such
nesting, which I agree with.  The notation for nested variations is
required to allow the expression of such groupings.

Since the notation for nested variants is exactly the same as that for
non-nested variations, I don't see why nested variations are more
confusing than others.  In both cases, the base text is given, followed
by the apparatus entry.  If it is confusing to print 'The quick'
followed by an entry saying C reads 'sleek' not 'quick', why is it not
confusing to print 'The quick brown fox' followed by an entry saying
that B reads 'A silver wolf'?  Any confusion here results from the
double attachment method, not the nesting, I think.  Parallel
segmentation avoids this problem (if it is one) by denying any
structural role to the lemma.

7.  Attaching a variant to a point instead of a span.  Robin Cover
points out that additions in the variants (or omissions in the base
text) require the lemma to occupy zero space, not a span.  True.
Instead of

    <app point=a2>

however, which leaves APP with attributes for STARTPOINT, ENDPOINT, and
POINT, with rules about which combinations can be used which are
unenforceable in SGML, I'd suggest

    <app startpoint=a2 endpoint=a2>

which conveys the same information and eliminates the dichotomy between
additions and omissions-or-changes.

8.  If I'm right in saying all of the encoding methods record the same
information and thus can be translated mechanically among themselves,
then presumably the choice of one or two methods to carry forward into
the next draft must be made on perspicuity and ease of processing.
That suggests we should consider what kinds of processing are to be
supported.  Here's a quick list to start with; from a text with
variants, I want to be able to:

    a.  transform the file into EDMACS format so I can print a
    text with apparatus using TeX.

    b.  extract the running text of any given witness.

    c.  extract summary information on variations in the style of
    Greg, Quentin, and Dearing, thus:

         quick A : sleek C
         The quick (sleek C) brown fox AC : A silver wolf B
         A : C
         AC : B

         etc.

    d.  filter the text, retaining only readings from specified
    witnesses or groups

    e.  mechanically translate the text using a different witness
    as a base ms.

    f.  for any given point on the text, show what variants are
    open at that point (e.g. to bold all words of the base text
    which have variants opposing them, in an editor)

This isn't necessarily complete, but it's what comes to mind first thing.
Additions gratefully accepted.

-Michael Sperberg-McQueen