compound documents

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

compound documents

Michael Sperberg-McQueen
About compound documents in SGML and in the TEI.  R.P. Weber asked a
week ago "could someone please explain the TEI approach to compound
documents and images?  WIll SGML be used here, and if so, how?"

Apologies for my delay in answering.  I was hoping one of our hypertext
sages might weigh in with a reply.  (But he appears to have been in the
Caribbean, and may not have received the query.)

This problem has not, in fact, been discussed on this server, or as far
as I'm aware by the working committees.  So the formal answer is that no
decision has been cast in concrete yet.  Which allows me to turn the
tables and say "How *should* compound documents be encoded for
interchange?  What are the requirements?  What are the alternatives?"

Less formally, I can offer some personal opinions, for what they are
worth (face value:  two cents -- 2u, for those of you on IBM mainframes
with real IBM terminals).

Certainly SGML is where we should start in any search for ways of
handling compound objects, and I don't yet know any reason that SGML
won't provide a solution for the problem.  I assume there are two
methods of using SGML in compound documents (correct me if I'm wrong):
(1) use SGML to organize the compound document (i.e. have an SGML
document which includes text, images, sound, etc. as its components), or
(2) use whatever-you-like to organize the compound document as a whole,
and use SGML as the notation for the textual components of the compound

For the simple case of text-with-illustrations, SGML seems like a viable
encoding mechanism (for the envelope and for the text components) to me.
It allows you to encode the graphics however you like, declaring your
graphics format as a non-SGML notation and declaring the contents of
your graphics elements (say 'PICTURE' or 'BLORT') as being data in that
notation, stored either within the SGML file or externally to it.  You
get localization of the graphics within the text stream, integral or
separate storage of the graphics, and complete freedom to choose
whatever graphic notation you wish.

As document encoding methods go, SGML is fairly hospitable to graphics
and other non-text pieces of compound objects.  Nowhere in the standard
does it say that the data have to be words and characters.  In fact, as
far as I know there is no *explicit* requirement in the standard that an
SGML document even has to be bytes in a computer.  (Sure, it's hard to
understand the standard any other way, but that's not the same as an
explicit requirement.)  ISO 8879 par. 6.1 note 1 says in fact "This
International Standard does not constrain the physical organization of
the document within the data stream, message handling protocol, file
system, etc., that contains it."  At the SGML '89 conference last
October in Atlanta, there was a very nice paper by Douglas MacLeod (read
by Yuri Rubinsky) thinking about architectural designs as SGML
documents, which led to a general discussion of SGML definitions for all
sorts of objects, including automobiles.  Although most people
(obviously) think of the SGML document as an electronic *description* of
the automobile (and the physical automobile as a side effect of
processing), it appears, in the light of the passage cited, hard to say
categorically that an automobile itself could never be parsed as an SGML
document.  (If you could figure out how to define the delimiters.)

The only hitch is that the SGML standard itself (ISO 8879) does not
specify in any detail what the interface between SGML processors and
non-SGML processors must, may, or can look like -- an advantage, if you
will, in that it doesn't constrain anyone to an inappropriate model, but
a bit of a disadvantage in that most people don't have a clue what they
can now or will eventually or might someday be able to do with SGML and
graphics processors.

Not being deeply involved in graphics work or compound documents myself,
I don't know off-hand what options are offered for this sort of thing by
existing SGML processors.  There will certainly be a fierce market
demand for it, not only from humanists but also (to our great advantage)
from the defense industry, which needs SGML support for technical
manuals with diagrams (and of course cross-references and other
hypertext mechanisms) and has the small change to pay for the
development costs.  (As long as they don't charge the humanists
defense-contractor prices!)

If for some reason one does *not* want to use SGML as the envelope for
the entire compound document, then presumably the major requirement for
the text-components of the compound documents is that they be
computationally well-behaved, with a clearly defined structure, hooks
for pointers going out, and hooks for pointers coming in.  SGML
certainly has all of this, in its document type declarations and its ID
names and its IDREF pointers.

Perhaps those subscribers to this list who actually work with compound
documents and SGML will be willing to say how they make things work now,
and how they would like to see things developing in the future.

All this is, I repeat, just personal opinion and shouldn't be taken as
defining "the" position of the TEI.  (Unless, of course, taking as "the"
position will help get a discussion started.)

-Michael Sperberg-McQueen
 University of Illinois at Chicago