MORE ON ASCII ETEXT

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

MORE ON ASCII ETEXT

Robin C. Cover
Michael Hart's renewed request for "plain ascii" text from TEI, or text
that can be read on "normal" computers, is analogous to the dilemma faced
by this chap who wants LaTeX commands removed but structure retained:

> From: [hidden email] (Sanjiv K. Bhatia)
> Newsgroups: comp.text.tex,comp.text
> Subject: Help with removing latex commands
> Keywords: NOT delatex
> Organization: Comp Sci and Engr, Univ. of Nebr.
>
> I am looking for a program that will remove all the LaTeX specific commands
> from a document while preserving the structure of the document.  I have
> delatex but that removes the structure of the document.  I do not mind if
> things like equations, tables, and pictures are removed.  All I am
> interested in is plain ASCII text.
>
> I have looked into dvitty but that messes up the words.
>
> Thanks for any pointers.
>
> Sanjiv

At one level, I am sympathetic with Mr. Hart's request: I often receive
files in Script or TeX which I can't read (easily) on my PC, so I've looked
for utilities to "remove" the formatting elegance, leaving me with
aesthetically-impoverished raw-ascii text that indeed can be read with DOS
"type."  I recall feeling very sheepish when I asked a TeX guru for
something that would just print a standard page from a DVI file:
fixed-pitch, uniform point size, just spitting out "the words."  (I
understand such an utility does exist, though I've never found a DOS
version; it seems like a nice alternative to cluttering a hard disk with 10
megabytes of TeX fonts.)

With SGML (TEI) markup, however, such a request makes less sense.  It's not
just "formatting" structure that gets removed when you take out descriptive
markup -- indeed, it's a conviction dear to SGML that formatting information
be kept out of the encoding, thus separating content from presentation --
but vital **information about the content.**  The richer the encoding
(analytical- interpretive information, such as literary and linguistic
scholars need for quantitative study of tagged corpora) -- then, obviously,
the greater the loss of information when one removes the tagging.  Others
have pointed out that in some cases, removal of all tagging may not leave a
sensible or useful residue.

Quite a bit of software development has already been done to provide
"translators" between SGML-tagged texts and formatters.  According to one
source, "Image Network (the xroff people) developed some sort of
public-domain  conversion tool for the U.S. government to convert SGML into
xroff."  You can get PD versions of x/troff for a PC -- would that qualify
as a "normal" machine within reach of the masses?  NIST is developing
similar tools, as well as a PD SGML parser.  It may be expected that
development of sophisticated software will continue to be done on large
systems, but surely the best results will filter down to inexpensive micros
platforms.  SGML editors are already available for Mac and DOS
microcomputers.

I agree with Lou Burnard (and others) that the best way to think about
SGML/TEI tagged texts is not with an impulse to remove encoding, but to make
intelligent use of it.  That's why I posted summaries of LECTOR (Waterloo)
and DynaBook (Electronic Book Technologies): this is software driven by
user-defined electronic stylesheets that permit dynamic viewing of these
texts, suppressing or revealing levels of structure and content objects,
or classes of content objects, as optimally suit **YOUR** research goals
at a given moment.  These software tools permit searching and/or hypertext
browsing based upon GI's that describe/delimit text regions in an intelligent
way.  I predict, contrary to Lou's "doctrinaire" verdict (which I feel he
did not mean to stifle intelligent discussion), that "pretty-printing" of
SGML (TEI) documents will become increasingly feasible and sensible, especially
for some document classes: AAP/EPSIG and other movements are dedicated to
making it happen on paper.  But to the extent that encoded texts readily
become hypertexts through encoding enrichment, it makes less-and-less sense
to believe we can do justice to these texts in a single view or screen shot.
The SIL people (e.g., Gary Simons) have a potent claim that texts *ARE*
linguistically/literarily multi-dimensional whether we recognize this or not:
when we document the multi-dimensionality in encoding (lexical mappings;
morphological analyses; etc.) then we betray our convictions in asking to
see these texts (on ascii terminals) in a single plane.

Forgive me if some of the subtlety of this discussion has eluded me -- but I
think affordable software for editing/viewing SGML-TEI texts will be
available by the time these texts are encoded under mature guidelines.  If
the very *BEST* software for such purposes will not be available or
affordable on my personal computer (it won't) -- well, that's the way the
world is already.  We have to live in it.

unrefined and unedited musings by...
Robin Cover

Reply | Threaded
Open this post in threaded view
|

Re: MORE ON ASCII ETEXT

Michael S. Hart-2
On Wed, 14 Nov 90 23:12:05 CST Robin C. Cover said:
>Michael Hart's renewed request for "plain ascii" text from TEI, or text
>that can be read on "normal" computers, is analogous to the dilemma faced
>by this chap who wants LaTeX commands removed but structure retained:

No, as Robin is most certainly well aware, my suggestions for inclusion in
the TEI proposals, guidelines, etc., are not

"analagous to the dilemma faced by this chap . . . ."

rather this is a request for the inclusion of the users of vast majorities
of various computers and programs in use, instead of a limitation to those
in the small percentage which are SGML oriented.  This proposal in no way,
no way at all, would change the files which are in SGML format, but should
only add an easy manner for users of popular computers and programs.  This
request is not meant to change, limit, or otherwise have any effect on the
users of the SGMLified texts, only to broaden the base of accessibility to
the 99% of computer users who are not SGML oriented.


Thank you for your interest,

Michael S. Hart, Director, Project Gutenberg


INTERNET:  [hidden email]
BITNET:    [hidden email]

The views expressed herein do not necessarily reflect
the views of any person or institution.  Neither Prof
Hart nor Project Gutenberg have any official contacts
with the University of Illinois.

"NOTICE:  Due to the shortage of ROBOTS and COMPUTERS
 some of our workers are HUMAN and therefore will act
 unpredictably when abused."