I'm glad that my post provoked a flurry of suggestions. Thanks also
to those who replied privately.
I think there are several conclusions that can be drawn, but I
shan't try to do so until we've had an answer to my next question,
which is this:
A. has a word-processed text including simple control codes which
he wishes to replace by SGML markup for italics, bold, and other
emphasis. He also wishes to add simple markup for the main components
of the text structure (paragraph, sentence, etc.). He wants to do
all this automatically on his micro, before adding more detailed/specialized
What software should he use? We'll assume he is using a PC.
> A. has a word-processed text including simple control codes which
> he wishes to replace by SGML markup for italics, bold, and other
> emphasis. He also wishes to add simple markup for the main components
> of the text structure (paragraph, sentence, etc.). He wants to do
> all this automatically on his micro, before adding more detailed/specialized
> comments manually.
> What software should he use? We'll assume he is using a PC.
Hmmm. "Italics, bold, and other emphasis" are elements of specific
markup. Generalized markup would have such things as "quote",
"emphasis", "keyword", "foreign word", "identifier", etc. The writer
has made several choices on how to represent these abstract entities on
his limited word processor. He should, therefore, find all the
different instances when he has chosen to use italics, and realize the
abstract idea behind this choice, and then tag it. This cannot be done
entirely automatically, although it may be helped by it. Powerful
tools are necessary to make this easy enough for the vast amounts of
existant information to be structured and encoded in finite time.
Ideas on this was what I thought the Text Encoding Initiative was all
about. "Converting" from /italics/ to <italics>italics</> isn't my
idea of text encoding.
The cruder text structure, such as paragraphs, headers, and such, can
be marked up automatically, however, given sufficient amounts of
intelligence when the specific markup was designed.
The software he should use has yet to be written. I'd hate to see it
written for a PC, though. Better to write the software on real
computers and port it down to such lowly beasts only when fancy user
interfaces featuring "novice friendly" interactions with the system
won't interefere with the desired functionality and abilities.
Sorry for sounding so negative. I had visions of a more fundamental
approach to the problem at hand than how to shoe-horn scurvy PC software
encoding formats into the framework of a generalized markup. In my
view, you can easily go from generalized markup to specific markup, but
you cannot, in principle, go from specific markup to generalized markup,
because of the information loss in the generalized -> specific layout
process. I have also worked with ODA, which in this respect is more
powerful than SGML. ODA lacks the abstraction abilities that SGML has,
as well as several important tools, such as entities and marked
sections, but it does encompass both generalized and specific markup,
and it has kept them very separate. SGML does not address specific
markup at all, even though some of the people on this list seem to think
so. SGML addresses information structure, and we should make the most
out of that. The presentation of the text is a simple matter in