Parsing SGML text to omit the tags

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Parsing SGML text to omit the tags

Robert A Amsler-2
What Anders Thulin proposes, namely an SGML parser that
would output the text without the tags isn't possible because
without the tags the text would be damaged, i.e. footnotes or
other text characteristics which would appear typeset at some
other logation in the production document would be run-together
in the text.

Pure SGML-tagged text does not necessarily contain ANY carriage-returns
or blanks for appearance. Thus headings, title page materials, etc.
would just come out streamed together.

What is actually being requested is that an SGML text formatter
be released which produces output for output devices presumably
having special characteristics such as 80-column width lines
and no fixed number of lines per page. To appreciate how strange
some documents would look in that format, imagine a dissertation
without pages, or a newspaper or magazine article without columns;
and both with all text cross-references incompletely represented
and positioned in-line rather than at the end of the article.

Stripping out SGML is similar to stripping out TeX or troff.
You want to foFORMAT not STRIP tags--and formatting requires
decisions about presentation options. How are paragraphs
to be noted? Headings? Emphasis? How wide should the text be
per line? Should it be filled? Are quotations to have different
margins than regular text? Is the text to be hyphenated to better
fit on the lines? What type of numbering is to be used for
sections, subsections, chapters? etc.