Parsing TEI2 with SGMLS

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing TEI2 with SGMLS

Gregory J. Murphy
If any one has had similar difficulties, I would appreciate any guidance
offered.

SGMLS reports a large number of undefined elements in the current version
(tei2 ?) of the tei dtd, when run with the -u option. A list follows. I
have scrutinized their location in the dtds carefully, and can't for the
life of me figure out where the problem originates. Almost all are elements
defined in auxillary or additional tag sets, and thus sgmls shouldn't even
know about them unless they are included for a document instance, which,
in this case, they weren't. Go figure.

- Gregory Murphy
CETH


TABLE
MOVE
LANG
HANDLIST
SETTINGDESC
CAESURA
SPAN
CASTLIST
OREF
TEXTDESC
PVAR
TAG
VIEW
TREE
CL
FIGURE
SEG
TECH
GRAPH
HANDSHIFT
SOUND
SET
RESPONS

Reply | Threaded
Open this post in threaded view
|

Re: Parsing TEI2 with SGMLS

Harry Gaylord
Gregory,

You should not be using the -u switch. P2 and P3, I think, declare
parameter entities for all tags from all sets. You have activated
these undefined elements by using the -u switch.

If you were debugging a new dtd, this switch would be useful, but not
with TEI.

Harry

Reply | Threaded
Open this post in threaded view
|

Re: Parsing TEI2 with SGMLS

Michael Sperberg-McQueen
In reply to this post by Gregory J. Murphy
On Wed, 15 Jun 1994 08:39:23 Greg Murphy said:

>If any one has had similar difficulties, I would appreciate any guidance
>offered.
>
>SGMLS reports a large number of undefined elements in the current version
>(tei2 ?) of the tei dtd, when run with the -u option. A list follows. I
>have scrutinized their location in the dtds carefully, and can't for the
>life of me figure out where the problem originates. Almost all are elements
>defined in auxillary or additional tag sets, and thus sgmls shouldn't even
>know about them unless they are included for a document instance, which,
>in this case, they weren't. Go figure.

The quick answer is, the -u option on sgmls is not very useful for the
TEI DTDs, and you should ignore the warnings it generates (or turn it
off).  Those not interested in technical details may --- should --- stop
reading now.

OK, now, for the technically inclined:  brief tutorial on undeclared
element types, and why they are legal.

The -u option requests sgmls to report on any SGML generic identifiers
encountered in the DTD for which no element declaration is found.  It is
*not* an SGML error if such generic identifiers are found, for reasons
which should become clear in a moment, but in some cases it can reflect
a problem in the DTD (e.g. a typo, or an unsuccessful attempt to rename
an element, e.g. by changing all occurrences of HIGHLIGHTED to HI).  The
-u option is there to allow you to check that the DTD is 'clean' in the
sense that it doesn't mention anything you haven't declared.

Unless you select ALL the additional tag sets, and ALL the bases,
though, the TEI DTDs are not clean in this way; they do mention quite a
lot of elements which are not declared.  For example:

>TABLE
>MOVE
> ...

The TABLE element is declared as legal within the content of (among
others) paragraphs, since tables SHOULD be legal within paragraphs,
whenever the tag set for tables, etc., is selected.  Similarly, MOVE is
defined as legal within paragraphs and lines, or between them, so that
it can be used within and between speeches, when dramatic texts are
encoded.  When these tag sets are not selected, the elements should not
be legal.  Defining P as

  <!ELEMENT p - O (#PCDATA | table | move | ...)* >

ensures that if you select the appropriate tag set, the elements will be
legal in the right places.  Leaving them undeclared in the core tag set
ensures, on the other hand, that if you don't select the appropriate tag
set, the elements won't be legal at all.  (Barring the case that you
define them yourself ...)

So when the parser fails to find a declaration for MOVE and TABLE, even
though they are mentioned within the declaration for P, no error is
actually being detected.  (The messages issued by sgmls should be
warning messages, not error messages.)  It was because they foresaw
precisely this type of situation (users making use only of a subset of a
large publicly defined DTD) that the designers of SGML specified that
referring to undeclared elements is not an error.


-C. M. Sperberg-McQueen
 ACH / ACL / ALLC Text Encoding Initiative
 University of Illinois at Chicago
 [hidden email] / u35395@uicvm