Michael Hart (Project Gutenberg) writes:
I propose that the Text Encoding Initiative include, as part of their guidelines, programs, operations, etc., the inclusion of a requirement that access to TEI texts by word processors, search and retrieval programs, simple TYPE, LIST, GREP, CAT and other commands, so the great majority of computer users may benefit from these etexts. The major problem with this is that the TEI doesn't intend to produce any texts--and in fact has no direct plans yet to produce any software. The TEI probably should have been named TESI (The Text Encoding Standards Initiative) to have avoided this confusion, but TEI is the name. The second issue is whether in fact it is necessary that new standards fit old software? I.e. one could just as easily say that all WYSIWYG word processors should include commands for line-oriented editing or for batch-processing updates from command files. One could also claim that newer SGML-smart software will have no trouble doing these tasks and that it is the job of the TEI to promote abandonment of software which does not support SGML. To be conciliatory, there is merit to two points here. *** First, `Is there a presentation markup isomorphic to the TEI *** Guidelines?' *** Second, `Should there be a presentation markup *** format for the display of text with its SGML tags?' The first issue in my opinion the appropriate restatement of the question Michael Hart asked long ago about how to `strip out' the SGML tags. The answer is that the TEI hasn't in fact specified ANY presentation markup for its SGML Guidelines. SGML doesn't specify how to print text. (SGML isn't in fact about the printing of text; it is about the preservation of the content of text). I am personally skeptical that merely using blank space and line breaks that one can acceptably represent the TEI Guidelines. It would certainly be challenging to try and do so, something like trying to flaten a hypertext in an acceptable manner. The goal is not without merit and probably deserves more discussion. The second issue is one that I think does need some discussion. How should an SGML text be stored in a file? SGML doesn't say. As I've mentioned before, the OED2 is stored as a single stream of characters without any carriage-returns or padded blanks for presentation. SGML can be ``pretty-printed'' according to a fairly simple algorithm, i.e., Every additional opening tag indents one more space on a new line. Every text line that does not fit in the `width' of the output is justified within the left-margin established by the last opening tag it contained and the prevailing right-margin. Closing tags reset the left-margin to the point it was at before they were opened. A sequence of immediately consecutive closing tags is represented together on one line. Thus, something like: <ME><hw>apple</hw><pos>noun</pos><senses><def num=1><m>a fruit of a tree</m><eg>Eve gave Adam an <cw>apple</cw> in the Garden of Eden</eg></def><def num=2><m>a tree on which <xr>apples<sn>1</sn></xr> grow</m><eg>The box is made of <cw>apple</cw> wood</eg></def></senses></ME> Would pretty-print as: <ME> <hw>apple</hw> <pos>noun</pos> <senses> <def num=1> <m>a fruit of a tree</m> <eg>Eve gave Adam an <cw>apple</cw>in the Garden of Eden</eg></def> <def num=2> <m>a tree on which <xr>apples <sn>1</sn></xr> grow</m> <eg>The box is made of <cw>apple</cw> wood</eg></def></senses></ME> There are some problems here (as with all simple pretty-printing). The tags <cw>, <xr>, and <sn> don't "really" require a new line since they are "special" in this case, i.e. they are more oriented toward display of at the "word" level than at the document structure level. I think this sense of the inappropriateness of a break in the text to a new line is due to our innate sense of what presentation markup should be for "text". It would be appropriate for a `smart' pretty-printer to have a list of tags that are `in-line' rather than organizational and pretty print them differently. There is another very subtle problem which probably most of you missed. There are extra blanks in the SGML. Notice that my SGML contained, <eg>Eve gave Adam an <cw>apple</cw> in the Garden of Eden</eg> rather than <eg>Eve gave Adam an<cw>apple</cw>in the Garden of Eden</eg> Why? It is because I am using presentation markup for blank spacing (and line breaks). I.e. assuming there is a blank between elements of the text. That is fine UNTIL one comes to elements which in the presentation markup might NOT have blanks between them, such as, <xr>apples<sn>1</sn></xr> grow</m> I.e., this `might' be presented as `apples-1' or `apples(1)' or any of a number of other stylistic mechanisms. It might ALSO not even be intended for printing in the presentation markup. Thus, when I `pretty-print' this line as, <xr>apples <sn>1</sn></xr> grow</m> I cannot go back to the original. --- This is why I am a bit concerned about the implicit use of presentation markup in SGML text. However, it is probably preferable that something be said about the assumptions rather than everything left implicit. |
On Tue, 13 Nov 90 11:23:00 -0500 Robert A Amsler said:
>Michael Hart (Project Gutenberg) writes: > >I propose that the Text Encoding Initiative include, as part of their >guidelines, programs, operations, etc., the inclusion of a requirement >that access to TEI texts by word processors, search and retrieval programs, >simple TYPE, LIST, GREP, CAT and other commands, so the great majority of >of computer users may benefit from these etexts. > > >The major problem with this is that the TEI doesn't intend to produce >any texts--and in fact has no direct plans yet to produce any >software. The TEI probably should have been named TESI (The Text >Encoding Standards Initiative) to have avoided this confusion, but >TEI is the name. > produce any texts" which, as you can see I neither said nor implied. He would also have you believe that not producing any etexts has bearing on the issue. The fact is that I am proposing that the TEI guidelines mean and be what they should be, something providing Initiative for Encoding, and not just encoding for the use of a single digit percentage of users. If, as Mr. Amsler suggests, the name is inappropriate, then it should be changed, even though he implies it is written in stone, when, in fact it is written in electronic etext (sometimes transferred to paper). >The second issue is whether in fact it is necessary that new >standards fit old software? I.e. one could just as easily say >that all WYSIWYG word processors should include commands for >line-oriented editing or for batch-processing updates from >command files. > Another dead herring: this time asking you, the reader to fallaciously equate a simple proposal for the releases of texts in those universally available formats known as "DOS text files," etc, with a proposal "that all WYSIWYG word processors should include commands for line-oriented editing or for batch-processing updates from command files." The real issue here is whether or not TEI and its associated others can and should create guidelines which restrict practical usage of TEI text to a single digit percentage of computer users. The alternative is the creation and distribution of etexts which would, could, and should have their home in virtually all computers around the world. >One could also claim that newer SGML-smart software will have no >trouble doing these tasks and that it is the job of the TEI to >promote abandonment of software which does not support SGML. > >To be conciliatory, there is merit to two points here. > >*** First, `Is there a presentation markup isomorphic to the TEI >*** Guidelines?' > >*** Second, `Should there be a presentation markup >*** format for the display of text with its SGML tags?' > >The first issue in my opinion the appropriate restatement of the >question Michael Hart asked long ago about how to `strip out' the >SGML tags. Actually, this question was still under discussion last month of course which belies another fallacious argument that you should ignore all the points because they are old. Fallacious on two counts: one that being old somehow strips point of its truth, two that this issue is "long ago about how to `strip out' the SGML tags." Non to mention that one might not want, perhaps should not want a "restatement of (one's) question"s, particulary when one has not been consulted. > The answer is that the TEI hasn't in fact specified >ANY presentation markup for its SGML Guidelines. SGML doesn't specify >how to print text. (SGML isn't in fact about the printing of text; >it is about the preservation of the content of text). > Actually, I have never mentioned " the printing of text," only the way the text appears to the eye when viewed as a standard textfile when using the standard modes for reading textfiles. Of course in this mode it might become more easily printable. >I am personally skeptical that merely using blank space and line >breaks that one can acceptably represent the TEI Guidelines. It >would certainly be challenging to try and do so, something like >trying to flaten a hypertext in an acceptable manner. The goal >is not without merit and probably deserves more discussion. > For the umpteenth time I must correct this misquotation: proposal is only to include the easy use of these texts ALONG WITH OTHER OF THE TEI GUIDELINES, NOT TO REPLACE ANY OTHER GUIDELINES, access is the point of the proposal, access for those who have normal access to normal programs for reading, searching or retrieving text files on relatively "normal" computers, if such a thing exists. >The second issue is one that I think does need some discussion. >How should an SGML text be stored in a file? SGML doesn't say. >As I've mentioned before, the OED2 is stored as a single stream >of characters without any carriage-returns or padded blanks for >presentation. SGML can be ``pretty-printed'' according to a fairly >simple algorithm, i.e., xxxxxxxx Many lines about ``pretty-printing'' deleted. xxxx >--- >This is why I am a bit concerned about the implicit use of presentation >markup in SGML text. However, it is probably preferable that something >be said about the assumptions rather than everything left implicit. End of Mr. Amsler's note. My note must also end here, and perhaps should have ended before it began. I fear this may not have been the best way to deal with the matter, and welcome assistance. Thank you, Michael S. Hart |
Free forum by Nabble | Edit this page |