>Ahem, I've been listening in to the discussions on this list for a
>while, and I think this current debate on un-smgl-ifying (!) text
>is quite beside the point. SGML adds structure to and information
>about a text. This cannot just be "stripped" away, without major
>loss of usefulness of the text. I'm opposed to the very idea of
>stripping off information just like that!
I agree with most of what Naggum posted in the message which is quoted
above, but the underlined sentence above is wrong. SGML adds no structure to
the text. SGML describes the text and makes the structure clearer, but the
structure was there when the author wrote the text. Perhaps the author didn't
even consciously know the structure he laid down, just as a 6-year-old can't
diagram the sentences he speaks. Also, the structure that one person sees in
the text may not be the same as every other person. So adding TEI- conformant
tags to a previously-only-printed text can be a creative work of its own,
subject to criticism, but it is not adding anything but usefulness to the text.
What Michael Hart does not seem to realize is that his plain texts are
marked up, too. Perhaps there are tabs at the beginning of paragraphs.
Certainly he likes punctuation at the end of sentences and spaces between
words. These are all clues about the structure of documents. Each sentence of
a text has its own structure, which most people either don't need to study and
is apparant to most readers if they need to know it for a particular sentence.
If I <not.a.real.word>electronify</not.a.real.word> a text and tag all words
with their part of speech, I have not added to the text at all. Software
should and <emph>will</emph> eventually make my part-of-speech tags invisible
or gone if the next user doesn't want them.
The problem with distributing a text which only has the markup that can
be denoted by punctuation and spaces is that some of the features in the texts
that Hart wants to make electronic cannot be represented without SGML-type
markup. I ask Hart now how he denotes italicised and boldfaced words , how he
denotes chapters and verses, how he denotes page numbers from the original
printed text? A conservative e-editor of a text will mark them something like
<highlighted rendition=italic>this</highlighted>, while one with some guts will
interpret the text and mark it like <emph>this</emph>. Hart seems to want to
make his texts fit a minimum standard, which meets the requirements of
readability and quotation-finding-ability, and, he notes, it is difficult with
most of today's word processors to find "now is the time" when the actual text
says "<emph>now</emph> is the time." However, finding "now is the time" is
not finding what is actually in the text. The author thought that "now" should
be emphasized. Hart's reader doesn't know that unless he consults the printed
source, and he'll have a hard time finding the line in the book when he doesn't
know what page it's on. Consider this example, which part of the text, but
not the formatting, of the preface of the TEI Guidelines:
This document represents the first fruits of the Text Enco-
ding Initiative, a four year international research project
begun in June 1988, the goal of which is to provide Guide-
lines for the encoding of electronic texts.
If I search this text in normal word processor fashion for the word
"encoding," then it will not show up, due to the markup "-". Therefore
the search will miss this instance of the word, and anyone hoping to count
"encoding"s will be at least one short.
So what are my points? SGML adds nothing to the text. Hart's plain
format is not the complete text, but an unacceptably abridged form of it.
Hart's plain format is not unmarked, but is rather a degree of markup decided
on because it resembles (but hasn't the robustness of) the printed form.
Hart's plain format doesn't even do everything he demands of it and remain true
to the original text.
Thanks for reading this far.
> >Ahem, I've been listening in to the discussions on this list for a
> >while, and I think this current debate on un-smgl-ifying (!) text
> >is quite beside the point. SGML adds structure to and information
> >about a text. This cannot just be "stripped" away, without major
> >loss of usefulness of the text. I'm opposed to the very idea of
> >stripping off information just like that!
> >[Erik Naggum]
> I agree with most of what Naggum posted in the message which is quoted
> above, but the underlined sentence above is wrong. SGML adds no structure to
> the text. SGML describes the text and makes the structure clearer, but the
> structure was there when the author wrote the text.
I don't disagree with your comment, so just let me clarify. SGML
enables the structure to be communicated between author and reader, in
addition to the text. The structure was there, in whatever concrete
form the author chose to represent it, but that is not necessarily
communicable as such. Witness the introductions found in most books
dealing with linguistics, syntax, etc, where they often go to enormous
details about the meaning of the physical renditions used to represent
abstract ideas. Putting boxes around characters or words in entry-level
computer user manuals to represent keys on the keyboard, for instance,
or the intense discussions in dictionaries about layout specifics.
So, I would still contend that SGML adds structure to the _text_, but
not appreciately to the _meaning_ of the text, since that has always
been in the head of the author, anyway. Communicability wins, though.
Naggum Software, Oslo, Norway
|Free forum by Nabble||Edit this page|