BNF and SGML

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

BNF and SGML

Michael Sperberg-McQueen
John Baima asks whether there is a BNF form of the TEI DTD.  A while
back, Richard Goerwitz asked whether there was a BNF definition
(actually, any 'concise syntactic summary' or 'clean, short summary') of
(the TEI subset of) SGML.  These are distinct questions, and both should
be answered.

Richard Goerwitz first.  There is definitely a formal grammar defining
the syntax of SGML -- ISO 8879 uses formal grammar productions to define
the form of an SGML document.  Though not strictly BNF, it's fairly
close.  The difficulties in writing a BNF equivalent for use in
syntax-driven programs are that the grammar is clearly not written with
automatic parser generators in mind, and some productions, while clear
enough in their intent, present difficulties for automatic parser
generation.  Also a lot of details are conveyed only in the accompanying
prose, not in the formal productions.  In a couple of cases, the
productions seem to me to be downright misleading and to contradict
the prose (but I'm not really an expert).

A formal description of the TEI subset of SGML does sound like a good
idea; I've been working on something similar for a while, when I get the
chance (i.e. rarely), and if there is serious interest I will try to
finish it.

John Baima's question I interpret to mean "is there a BNF definition for
TEI documents?" and not "... for TEI DTDs", since the DTDs are described
by the formal grammar of ISO 8879, and that part of the grammar is
relatively clean and simple.

In some sense, the formal grammar of ISO8879 describes TEI documents,
and one should be able to parse TEI documents using it or some
facsimile.

Validating the documents, however, is more complicated.

The DTD itself provides a formal description of TEI documents, using a
regular-right-part grammar (that means the right hand of a production
can have regular expressions, which is a slight enrichment over Backus's
normal form, I think).  Some other complications (notably inclusion and
exclusion exceptions) can make the production of strict BNF equivalents
rather complicated, and largely as a result I doubt that BNF parser
generators are going to be as useful a tool for SGML validation as they
are in other contexts.  This is one reason many computer scientists
shake their heads mournfully when you mention SGML to them.

Since the document has the right to modify the standard TEI DTD in any
case, any software for TEI validation must be able to parse from a
formal grammar presented at run time -- this is like building yacc into
your application program.  It's not impossible, but it is simpler to
work with an existing SGML processor.

So:  no BNF in the strict sense, but something close as to the
structure of tags and content, and something less close as to the
legal combinations of tags.

Michael Sperberg-McQueen