Michael Hart's renewed request for "plain ascii" text from TEI, or text
that can be read on "normal" computers, is analogous to the dilemma faced by this chap who wants LaTeX commands removed but structure retained: > From: [hidden email] (Sanjiv K. Bhatia) > Newsgroups: comp.text.tex,comp.text > Subject: Help with removing latex commands > Keywords: NOT delatex > Organization: Comp Sci and Engr, Univ. of Nebr. > > I am looking for a program that will remove all the LaTeX specific commands > from a document while preserving the structure of the document. I have > delatex but that removes the structure of the document. I do not mind if > things like equations, tables, and pictures are removed. All I am > interested in is plain ASCII text. > > I have looked into dvitty but that messes up the words. > > Thanks for any pointers. > > Sanjiv At one level, I am sympathetic with Mr. Hart's request: I often receive files in Script or TeX which I can't read (easily) on my PC, so I've looked for utilities to "remove" the formatting elegance, leaving me with aesthetically-impoverished raw-ascii text that indeed can be read with DOS "type." I recall feeling very sheepish when I asked a TeX guru for something that would just print a standard page from a DVI file: fixed-pitch, uniform point size, just spitting out "the words." (I understand such an utility does exist, though I've never found a DOS version; it seems like a nice alternative to cluttering a hard disk with 10 megabytes of TeX fonts.) With SGML (TEI) markup, however, such a request makes less sense. It's not just "formatting" structure that gets removed when you take out descriptive markup -- indeed, it's a conviction dear to SGML that formatting information be kept out of the encoding, thus separating content from presentation -- but vital **information about the content.** The richer the encoding (analytical- interpretive information, such as literary and linguistic scholars need for quantitative study of tagged corpora) -- then, obviously, the greater the loss of information when one removes the tagging. Others have pointed out that in some cases, removal of all tagging may not leave a sensible or useful residue. Quite a bit of software development has already been done to provide "translators" between SGML-tagged texts and formatters. According to one source, "Image Network (the xroff people) developed some sort of public-domain conversion tool for the U.S. government to convert SGML into xroff." You can get PD versions of x/troff for a PC -- would that qualify as a "normal" machine within reach of the masses? NIST is developing similar tools, as well as a PD SGML parser. It may be expected that development of sophisticated software will continue to be done on large systems, but surely the best results will filter down to inexpensive micros platforms. SGML editors are already available for Mac and DOS microcomputers. I agree with Lou Burnard (and others) that the best way to think about SGML/TEI tagged texts is not with an impulse to remove encoding, but to make intelligent use of it. That's why I posted summaries of LECTOR (Waterloo) and DynaBook (Electronic Book Technologies): this is software driven by user-defined electronic stylesheets that permit dynamic viewing of these texts, suppressing or revealing levels of structure and content objects, or classes of content objects, as optimally suit **YOUR** research goals at a given moment. These software tools permit searching and/or hypertext browsing based upon GI's that describe/delimit text regions in an intelligent way. I predict, contrary to Lou's "doctrinaire" verdict (which I feel he did not mean to stifle intelligent discussion), that "pretty-printing" of SGML (TEI) documents will become increasingly feasible and sensible, especially for some document classes: AAP/EPSIG and other movements are dedicated to making it happen on paper. But to the extent that encoded texts readily become hypertexts through encoding enrichment, it makes less-and-less sense to believe we can do justice to these texts in a single view or screen shot. The SIL people (e.g., Gary Simons) have a potent claim that texts *ARE* linguistically/literarily multi-dimensional whether we recognize this or not: when we document the multi-dimensionality in encoding (lexical mappings; morphological analyses; etc.) then we betray our convictions in asking to see these texts (on ascii terminals) in a single plane. Forgive me if some of the subtlety of this discussion has eluded me -- but I think affordable software for editing/viewing SGML-TEI texts will be available by the time these texts are encoded under mature guidelines. If the very *BEST* software for such purposes will not be available or affordable on my personal computer (it won't) -- well, that's the way the world is already. We have to live in it. unrefined and unedited musings by... Robin Cover |
On Wed, 14 Nov 90 23:12:05 CST Robin C. Cover said:
>Michael Hart's renewed request for "plain ascii" text from TEI, or text >that can be read on "normal" computers, is analogous to the dilemma faced >by this chap who wants LaTeX commands removed but structure retained: No, as Robin is most certainly well aware, my suggestions for inclusion in the TEI proposals, guidelines, etc., are not "analagous to the dilemma faced by this chap . . . ." rather this is a request for the inclusion of the users of vast majorities of various computers and programs in use, instead of a limitation to those in the small percentage which are SGML oriented. This proposal in no way, no way at all, would change the files which are in SGML format, but should only add an easy manner for users of popular computers and programs. This request is not meant to change, limit, or otherwise have any effect on the users of the SGMLified texts, only to broaden the base of accessibility to the 99% of computer users who are not SGML oriented. Thank you for your interest, Michael S. Hart, Director, Project Gutenberg INTERNET: [hidden email] BITNET: [hidden email] The views expressed herein do not necessarily reflect the views of any person or institution. Neither Prof Hart nor Project Gutenberg have any official contacts with the University of Illinois. "NOTICE: Due to the shortage of ROBOTS and COMPUTERS some of our workers are HUMAN and therefore will act unpredictably when abused." |
Free forum by Nabble | Edit this page |