Re: Michael Hart's suggestions for the TEI

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Michael Hart's suggestions for the TEI

Robert A Amsler-2
Michael Hart (Project Gutenberg) writes:

    I propose that the Text Encoding  Initiative include,  as part of
    their guidelines, programs, operations, etc., the  inclusion of a
    requirement that access to TEI  texts by  word processors, search
    and retrieval programs,  simple TYPE,  LIST, GREP,  CAT and other
    commands,  so the  great majority  of computer  users may benefit
    from these etexts.

The major problem with this is that the TEI doesn't intend to produce
any  texts--and  in  fact  has  no  direct  plans yet  to produce any
software. The TEI probably should have been named TESI (The Text
Encoding Standards Initiative) to have avoided this confusion, but
TEI is the name.

The second issue is whether in fact it is necessary that new
standards fit old software? I.e. one could just as easily say
that all WYSIWYG word processors should include commands for
line-oriented editing or for batch-processing updates from
command files.

One could also claim that newer SGML-smart software will have no
trouble doing these tasks and that it is the job of the TEI to
promote abandonment of software which does not support SGML.

To be conciliatory, there is merit to two points here.

*** First, `Is there a presentation markup isomorphic to the TEI
*** Guidelines?'

*** Second, `Should there be a presentation markup
*** format for the display of text with its SGML tags?'

The first issue in my opinion the appropriate restatement of the
question Michael Hart asked long ago about how to `strip out' the
SGML tags. The answer is that the TEI hasn't in fact specified
ANY presentation markup for its SGML Guidelines. SGML doesn't specify
how to print text. (SGML isn't in fact about the printing of text;
it is about the preservation of the content of text).

I am personally skeptical that merely using blank space and line
breaks that one can acceptably represent the TEI Guidelines. It
would certainly be challenging to try and do so, something like
trying to flaten a hypertext in an acceptable manner. The goal
is not without merit and probably deserves more discussion.

The second issue is one that I think does need some discussion.
How should an SGML text be stored in a file? SGML doesn't say.
As I've mentioned before, the OED2 is stored as a single stream
of characters without any carriage-returns or padded blanks for
presentation. SGML can be ``pretty-printed'' according to a fairly
simple algorithm, i.e.,

Every additional opening tag indents one more space on a new line.
Every text line that does not fit in the `width' of the output
is justified within the left-margin established by the last
opening tag it contained and the prevailing right-margin.
Closing tags reset the left-margin to the point it was at
before they were opened.
A sequence of immediately consecutive closing tags is represented
together on one line.

Thus, something like:

<ME><hw>apple</hw><pos>noun</pos><senses><def num=1><m>a fruit of a
tree</m><eg>Eve gave Adam an <cw>apple</cw> in the Garden of
Eden</eg></def><def num=2><m>a tree on which
<xr>apples<sn>1</sn></xr> grow</m><eg>The box is made of
<cw>apple</cw> wood</eg></def></senses></ME>

Would pretty-print as:

<ME>
 <hw>apple</hw>
 <pos>noun</pos>
 <senses>
  <def num=1>
   <m>a fruit of a tree</m>
   <eg>Eve gave Adam an
    <cw>apple</cw>in the Garden of Eden</eg></def>
  <def num=2>
   <m>a tree on which
    <xr>apples
     <sn>1</sn></xr> grow</m>
   <eg>The box is made of
    <cw>apple</cw> wood</eg></def></senses></ME>

There are some problems here (as with all simple pretty-printing).
The tags <cw>, <xr>, and <sn> don't "really" require a new line
since they are "special" in this case, i.e. they are more oriented
toward display of at the "word" level than at the document structure
level. I think this sense of the inappropriateness of a break
in the text to a new line is due to our innate sense of what
presentation markup should be for "text". It would be appropriate
for a `smart' pretty-printer to have a list of tags that are
`in-line' rather than organizational and pretty print them
differently.

There is another very subtle problem which probably most of you
missed. There are extra blanks in the SGML. Notice that
my SGML contained,

<eg>Eve gave Adam an <cw>apple</cw> in the Garden of Eden</eg>

rather than

<eg>Eve gave Adam an<cw>apple</cw>in the Garden of Eden</eg>

Why? It is because I am using presentation markup for blank spacing
(and line breaks). I.e. assuming there is a blank between elements
of the text. That is fine UNTIL one comes to elements which in the
presentation markup might NOT have blanks between them, such as,

<xr>apples<sn>1</sn></xr> grow</m>

I.e., this `might' be presented as `apples-1' or `apples(1)' or
any of a number of other stylistic mechanisms. It might ALSO
not even be intended for printing in the presentation markup.

Thus, when I `pretty-print' this line as,

    <xr>apples
     <sn>1</sn></xr> grow</m>

I cannot go back to the original.

---
This is why I am a bit concerned about the implicit use of presentation
markup in SGML text. However, it is probably preferable that something
be said about the assumptions rather than everything left implicit.

Reply | Threaded
Open this post in threaded view
|

Re: Michael Hart's suggestions for the TEI

Michael S. Hart-2
On Tue, 13 Nov 90 11:23:00 -0500 Robert A Amsler said:

>Michael Hart (Project Gutenberg) writes:
>
>I propose that the Text Encoding Initiative include, as part of their
>guidelines, programs, operations, etc., the inclusion of a requirement
>that access to TEI texts by word processors, search and retrieval programs,
>simple TYPE, LIST, GREP, CAT and other commands, so the great majority of
>of computer users may benefit from these etexts.
>
>
>The major problem with this is that the TEI doesn't intend to produce
>any  texts--and  in  fact  has  no  direct  plans yet  to produce any
>software. The TEI probably should have been named TESI (The Text
>Encoding Standards Initiative) to have avoided this confusion, but
>TEI is the name.
>
Mr. Amsler would have you believe that I stated "that the TEI intends to
produce any texts" which, as you can see I neither said nor implied.  He
would also have you believe that not producing any etexts has bearing on
the issue.  The fact is that I am proposing that the TEI guidelines mean
and be what they should be, something providing Initiative for Encoding,
and not just encoding for the use of a single digit percentage of users.
If, as Mr. Amsler suggests, the name is inappropriate, then it should be
changed, even though he implies it is written in stone, when, in fact it
is written in electronic etext (sometimes transferred to paper).

>The second issue is whether in fact it is necessary that new
>standards fit old software? I.e. one could just as easily say
>that all WYSIWYG word processors should include commands for
>line-oriented editing or for batch-processing updates from
>command files.
>
Another dead herring:  this time asking you, the reader to fallaciously
equate a simple proposal for the releases of texts in those universally
available formats known as "DOS text files," etc, with a proposal "that
  all WYSIWYG word processors should include commands for line-oriented
  editing or for batch-processing updates from command files."
The real issue here is whether or not TEI and its associated others can
and should create guidelines which restrict practical usage of TEI text
to a single digit percentage of computer users.  The alternative is the
creation and distribution of etexts which would, could, and should have
their home in virtually all computers around the world.

>One could also claim that newer SGML-smart software will have no
>trouble doing these tasks and that it is the job of the TEI to
>promote abandonment of software which does not support SGML.
>
>To be conciliatory, there is merit to two points here.
>
>*** First, `Is there a presentation markup isomorphic to the TEI
>*** Guidelines?'
>
>*** Second, `Should there be a presentation markup
>*** format for the display of text with its SGML tags?'
>
>The first issue in my opinion the appropriate restatement of the
>question Michael Hart asked long ago about how to `strip out' the
>SGML tags.

Actually, this question was still under discussion last month of course
which belies another fallacious argument that you should ignore all the
points because they are old.  Fallacious on two counts:  one that being
old somehow strips point of its truth, two that this issue is "long ago
about how to `strip out' the SGML tags."  Non to mention that one might
not want, perhaps should not want a "restatement of (one's) question"s,
particulary when one has not been consulted.

>           The answer is that the TEI hasn't in fact specified
>ANY presentation markup for its SGML Guidelines. SGML doesn't specify
>how to print text. (SGML isn't in fact about the printing of text;
>it is about the preservation of the content of text).
>
Actually, I have never mentioned " the printing of text," only the
way the text appears to the eye when viewed as a standard textfile
when using the standard modes for reading textfiles.  Of course in
this mode it might become more easily printable.

>I am personally skeptical that merely using blank space and line
>breaks that one can acceptably represent the TEI Guidelines. It
>would certainly be challenging to try and do so, something like
>trying to flaten a hypertext in an acceptable manner. The goal
>is not without merit and probably deserves more discussion.
>
For the umpteenth time I must correct this misquotation:  proposal
is only to include the easy use of these texts ALONG WITH OTHER OF
THE TEI GUIDELINES, NOT TO REPLACE ANY OTHER GUIDELINES, access is
the point of the proposal, access for those who have normal access
to normal programs for reading, searching or retrieving text files
on relatively "normal" computers, if such a thing exists.

>The second issue is one that I think does need some discussion.
>How should an SGML text be stored in a file? SGML doesn't say.
>As I've mentioned before, the OED2 is stored as a single stream
>of characters without any carriage-returns or padded blanks for
>presentation. SGML can be ``pretty-printed'' according to a fairly
>simple algorithm, i.e.,
      xxxxxxxx  Many lines about ``pretty-printing'' deleted. xxxx
>---
>This is why I am a bit concerned about the implicit use of presentation
>markup in SGML text. However, it is probably preferable that something
>be said about the assumptions rather than everything left implicit.

End of Mr. Amsler's note.

My note must also end here, and perhaps should have ended before it
began.  I fear this may not have been the best way to deal with the
matter, and welcome assistance.

Thank you,
Michael S. Hart