[ The following exchange between Professor Stig Johansson of Oslo
University and Dr Robert Amsler of Bellcore may be of wider interest,
and I have therefore taken the liberty of posting it to this list,
without asking permission of either of them. Comments, reactions,
even flames, welcomed.
Lou Burnard, Assoc Ed, TEI ]
Date: Wed, 11 Jul 90 13:01:21 +0100
From: Stig Johansson <h_johansson%[hidden email]>
Subject: How far SGML?
Message-Id: <805*[hidden email]>
Does everything have to be expressed through SGML? Is it good for all
types of texts? Cf the recent book by Herwijnen (1990: 12): 'A tree
representation can be used to describe many documents, but not all.'
Even if SGML can be made to do the work, is it the best way for all
texts and all types of textual features? For example, can it be used for
the discourse transcription needed in a spoken corpus without cluttering
up the text hopelessly? Why not try to convert a discourse transcription
to SGML-type tagging?
You will say - everything does not need to be keyboarded - there are
possibilities of tag minimization - tags can be suppressed, etc. But we
do want to be able to see the text simultaneously tagged for speaker
overlap. stress, intonation, etc. You will say - but SGML is for the
interchange format - it can be converted to/from a local format. The
best thing would be, however, if we could have a usable interchange
format or, at least, if we could minimize the difference between the
interchange format and local formats. Otherwise, who will bother to use
the interchange format?
What I am really asking is: what harm does it do if we can find some
simpler way of expressing, for example, intonation or word class?
Finally, are we really sure that we can define in advance a grammar for
textual features? This is what the DTD does (for a particular type of
These are some of the questions that need careful consideration. Or am I
just being stupid?
Oslo 25 June 1990
From: Robert A Amsler <[hidden email]>
Subject: Re: 3d note from SJ
Re: Stig's concerns over whether SGML should always be used to represent
There has been one category of material which from the beginning I have
concluded should not be represented in SGML. From this category I
believe one can abstract a general principle of what types of material
are not best served in SGML garb.
The category was programming language code. The reason was that since a
compiler existed for such code, it was not only unnecessary to convert
such material into SGML, it was outright harmful.
Since then, I've come to conclude that DBMS loader stream information
likewise should not be converted into SGML tagging. The reasoning is
similar, that it is a formal language already, which is checked by its
own software. In the case of database data, however, it was concluded
that there might be another form of the information from which the
database loader stream could be derived, and that this other format
might itself be an SGML format.
The general principle seems to be that what makes using SGML worthwhile
is that software exists to make use of the format and software exists to
validate the content of the format. Whenever other software exists for
another format, the utility of SGML is lessened to the degree that this
other software provides all the formal computational capabilities that
However, there is another principle which should be stated.
It is that most criteria which concern `readability' or `legibility'
for human users of text are in fact derivable by software from less
readable and less legible forms of information which are more useful to
computers. Using a computer is a bit of a Faustian deal. We gain
capabilities to do things, but lose a bit of the ability actually to
understand the raw information from which these capabilities derive.
|Free forum by Nabble||Edit this page|