delayed mail from Robert Amsler

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

delayed mail from Robert Amsler

Michael Sperberg-McQueen
(This note was lost for unexplained reasons.  -CMSMcQ)

Date:         Tue, 2 Oct 90 15:03:45 CDT
From:         Robert A Amsler <[hidden email]>
Subject:      SGML and `universal' readability

I used to believe in the idea of trying to develop a system for
encoding text that would be both readable to people and to computers,
however I now believe that is neither possible nor realistic.

In the first place, the representation that computers use keeps
getting more elaborate. It USED to be that text in computers
had to be all upper case. There were different interpretations
of given keypunch codes for punctuation between keypunches
and between printers. Some symbols were never available at all.
Then we reached ASCII and EBCDIC standardization and while people
in the USA feel things have stabilized, Europeans and Asians
and countless others ask `what about my language's alphabet?'
We are actually slightly beyond ASCII right now with encodings
attempting to hold on to things like inverse video, half-intensity,
and underlining. Other devices want to know which font or color
to use.

It seems to me that it will be impossible to continue pretending
that there is such a thing as `clean' text; but rather that there
may be something such as `reduced' text. The issue is whether
everyone will be happy with `reduced' text. Clearly the computers
cannot reliably restore `reduced' text back to its original form.
Colorization, the technique of adding color into black and white
movies using computer-assistance, clearly shows how traumatic
the process of going from `reduced' text back to fully specified text
could  be. Maybe I am missing something, but Mr. Hart seems to be
almost nostalgic about a form of reduced text called ASCII that
has already become merely a historic step in the evolution of
text representation in computers.

One other comment I'd make is that if you really think there
could be a simple representation of the information about
a text's content and presentation parameters one should ask
why bibliography isn't written as simple English rather than
in a highly complex language with books of rules. The reason
is that the details people are trying to represent have no
clear English format which would be better. There are literally
hundreds if not thousands of details about a printed work,
i.e. its bibliographic description, which someone might want to
know. The effort to concisely and CLEARLY represent these has
led to the current bibliographic practices. Far from being a botched
job of cluttered notational conventions, they represent a
best effort by a considerable number of sincere scholars.

SGML is a powerful new methodology for specifying text
representation languages. It may look cluttered, but
I would venture that ANY other system which attempted to preserve
the same information would merely be an alternate type
of clutter. The only way to preserve the types of
information any and everyone might want to have about a text
is through a similarly complex system. The best we can hope
for is that everyone agrees to one standard way instead of
each doing it their own way. I really doubt the `best' solution
is to elect to preserve only the lowest common denominator
of every representation. SGML is a real standard, i.e.
it has passed the gauntlet of national and international approvals
necessary. It would be nice if there were more such systems
from which to choose, but it took several years to get
SGML approved and I don't see any counter proposals with
comparable credentials.