This note from Michael S. Hart was diverted by the list server for
uninteresting technical reasons, but was intended for this list, as well as for the GUTNBERG list which Mr. Hart moderates. -CMSMcQ -------- From: "Michael S. Hart" <[hidden email]> Subject: Text Encoding and Decoding (Initiative) X-To: Text Encoding Initiative Discussion Group <[hidden email]> This column is a response to the tenets of the Text Encoding Initiative as perceived through various of their statements, both in print and in the electronic media such as listserver emailings on the order of Humanist. If any readers have any additional material which either supports or contradicts any of the generalizations made herein, an email copy of them to this address would be both appreciated and hopefully worth a mention, if not inclusion, in a future column. The Text Encoding Initiative (TEI) appears at first glance a wonderful thing, a movement to encourage the production of a vast library of easy to use electronic texts (etexts). Text encoding here, referring not so much to the actual encoding, as it were, of printed matter into universal etexts, as to a specific computer oriented language-the Standard Generalized Markup Language (SGML). This language does not so much translate the text into etext which can only be read by computers, as it does additions to the etexts, which point out various points of interest to an army of scholars who are the target audience for such things as a general rule. SGML does not remove anything from etext but it adds so much that it makes it difficult or impossible for the normal reader to scan the material in the manner you are likely to be scanning this column right now. Instead of a universal format which can be read well by both humans and computers with ease, various codes are entered by the encoders right into the text, codes which do not appear, as far as I have been able to determine, only at the ends of sentences, paragraphs, pages, chapters or what have you. If this were the case, then humans could easily develop a sense of reading procedures which would allow the eye and the mind to easily skip over the notations, if a reader wanted to pay attention to the text only. I have mentioned this on several occasions to the members of TEI which whom I am electronically acquainted, either via an email link or via phone. The responses have always been the same: This IS pure ASCII text and it doesn't need a method, inclusive or exclusive to the TEI program, to strip it of an interesting and useful set of added notation. I predict that if and when SGML becomes widely spread, strip features will be added not only to the authorized programs a person might use to work with them, but also to the various, and quite popular text search programs which include a strip feature which removes the high bits from all WordStar terms. Almost all programs now contain options which allow files to be transported to other programs for other uses. Unless TEI is intentionally being narrow minded in the scope of people, programs,and other, perhaps yet unforseen applications, they will provide the most universal electronic texts possible. I continue to request this feature, much in the manner which others requested WordStar strippers for the odd characters a normal text reader would see at the end of lines, paragraphs and other locations. These characters were not in the lower ASCII set most of us refer to as pure ASCII, and while there were not as many of them as in most SGML texts I have looked over, they actually changed the last character in some lines and paragraphs by adding an eighth bit, which was useful for the WordStar program, but annoying to the reader, especially when several logical choices were apparent to the reader, or perhaps no apparently logical choices at all. So far, in the world of electronic text, each provider seems to be insisting pushing their own products at least as much, if not more than the etexts themselves. The policy includes the inclusion of textual errors which allow identifications, for the purposes of copyright protection, of electronic text which would reside in the public domain if it were not for a markup, page numbering, or other scheme to create artificial but legal reasons for copyright protection. Let us not see SGML and TEI be used in a similar manner of a restrictive rather than open academic policy. Thank you for your interest, Michael S. Hart, Director, Project Gutenberg National Clearinghouse for Machine Readable Texts THESE NOTES ARE USUALLY WRITTEN AT A LIVE TERMINAL, AND THE CHOICE OF WORDS IS OFTEN MEANT TO BE SUCH AS TO PROVOKE THE GREATEST POSSIBLE RESPONSE SHORT OF BEING OFFENSIVE. TRUTH IN THESE NOTES IS OF GREAT CONCERN, THE FORM IS SECONDARY - OTHER THAN THE TOKEN EFFORT OF JUSTIFIED RIGHT MARGINATION. BITNET: HART@UIUCVMD INTERNET: [hidden email] (*ADDRESS CHANGE FROM *VME* TO *VMD* AS OF DECEMBER 18!!**) (THE GUTNBERG SERVER IS LOCATED AT [hidden email]) NEITHER THE ABOVE NAMED INDIVIDUALS NOR ORGANIZATIONS ARE A AN OFFICIAL REPRESENTATIVE OF ANY OTHER INSTITUTION NOR ARE THE ABOVE COMMENTS MEANT TO IMPLY THE POLICIES OF ANY OTHER PERSONS OR INSTITUTIONS, THOUGH OF COURSE WE WISH THEY DID. |
Free forum by Nabble | Edit this page |