Having just returned from being out of town to the most recent TEI-L
discussion, I have found, figuratively, the bleeding body of Bryan
Cholfin's note, shot by Bob Amsler in his reply. Fortunately, e-bullets
can be extracted and rarely do bodily harm; this is the more fortunate,
in that I think RA may have shot prematurely, and that BC's and RA's
views are not in fact in conflict.
The purpose of the TEI, according to the funding proposal we sent to NEH
last year, is the development and dissemination of guidelines for the
preparation and interchange of machine-readable texts; these guidelines
must be (again, paraphrasing the proposal):
suitable for interchange of already-existing texts
suitable for guidance in encoding new texts
flexible (guidelines, not rigid requirements)
device- and software-independent
For my part, I think Bryan Cholfin is wholly correct to distinguish
rigorously between ASCII-only text and markup-free text. Normal TEI
texts are limited to a subset of the ASCII characters (a subset shared
by most machines and character sets, even those in non-Anglophone
countries and those outside the ISO world, like EBCDIC) and thus
necessarily can be inspected with TYPE and similar programs. For this
reason, I have found puzzling Michael Hart's recent request that the TEI
guidelines require conforming texts to be manipulable with TYPE, GREP,
CAT, etc., since, as has been repeatedly explained on this list, the
character-set chapter already contains an explicit requirement which has
that effect. I believe MH objects not to the (non-existent) non-ASCII
characters, but to the markup itself, finding that the presence of
angle-bracket-delimited text makes the file unusable. Since others
find data unusable without the information in question, there appears
to be no compromise on this point.
I'm not sure exactly what BC means when he says
Now, -some- standard file format -does- need to be defined, so
that application programmers can build those filters into their
software, and be able to make the software output files that
other SGML-senstive software can make use of.
I believe the information required by application programmers is
contained in the definition of SGML (ISO 8879) and in the formal
document type declarations in the appendix of the TEI guidelines.
Further specification (e.g. restrictions as to line length or
requirements as to disposition of white space in the file) would seem to
me to be unnecessary and pernicious, since it has nothing to do with the
application-independent information in the file. (The difference
between this position and that propounded by RA in his note is, I
believe, merely one of degree: I lay more stress on SGML's
specification of what the SGML data stream looks like, and he much less.
In an absolute sense, he is right: the SGML standard explicitly states
that the physical form of the input stream is not restricted, though at
least part of it must conform to ISO 646 or ASCII, and the rest has to
be electronic character data if most of the standard is to be
To the comments of Willard McCarty and others on the desirability of
some common, predefined specification of a standard specification of
text encoded with TEI tags, all I can say is 'yes, that would clearly
make TEI texts more usable, and thus would help ensure that the TEI
scheme is widely adopted' and urge them to lay pen to paper or finger to
keyboard to develop such a specification, and then share it with us
There are several reasons I think such a presentation-specification
should not come from the TEI itself: it's too close to software
development (for which we lack the resources and mandate), it's a
specification of one user interface (and the TEI would risk having the
one interface confused with the underlying encoding, and having people
decline to use the TEI scheme 'because it puts two blank lines after the
section title and I hate that', and it would distract our slender
resources from the crucial and difficult specification of the tag set
and structure into an enterprise which, however useful, does not pose
any intrinsic conceptual difficulties. Presentation as a function of
markup has been specified for a long time by formatters and structured
editors, and the problem of how to specify the desired presentation has
been solved by troff, Waterloo and IBM GML, Author/Editor, CheckMark,
Nota Bene, any word processor with style sheets, DynaText, and many
others. It should not be beyond the wit of some subscriber to this list
to implement the TEI scheme or some significant subset thereof in one or
the other of these programs. We would welcome such enterprise, and I
can assure you there will be space on the server for whatever you
develop and wish to share (even if I have to learn UUENCODE and UUDECODE
to handle it!). If no volunteers are found willing, perhaps that will
be a sign that the community is not as interested in the problem as one
Any volunteers who run into interpretive problems understanding the
guidelines are hereby assured that any inquiries they have will be
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago
Michael S. Hart's response to:
"The Shot Heard 'Round the World"
This is is two parts, one referring to the NEH proposal (a copy should be
on the way for me to quote in more detail), the other in response to S-Ms
restatement of my proposal (the latest in a series of TEI-L restatements,
which I would much prefer to be quotations. There might be a reason.)
On Tue, 20 Nov 90 15:04:13 CST Michael Sperberg-McQueen 312 996-2477 -2981 said:
>Having just returned from being out of town to the most recent TEI-L
>discussion, I have found, figuratively, the bleeding body of Bryan
>Cholfin's note, shot by Bob Amsler in his reply. Fortunately, e-bullets
>can be extracted and rarely do bodily harm; this is the more fortunate,
>in that I think RA may have shot prematurely, and that BC's and RA's
>views are not in fact in conflict.
>The purpose of the TEI, according to the funding proposal we sent to NEH
>last year, is the development and dissemination of guidelines for the
>preparation and interchange of machine-readable texts; these guidelines
>must be (again, paraphrasing the proposal):
> suitable for interchange of already-existing texts
> suitable for guidance in encoding new texts
> flexible (guidelines, not rigid requirements)
> device- and software-independent
I would like to respond to these in order.
1. suitable for interchange of already-existing texts
Obviously before the advent of TEI, TEI-L, SGML, etc, there were no
etexts encoded in these formats, with these markups or whathaveyou.
Therefore such interchange would have to include etexts which could
not be marked up according to such standards. These etexts include
standard DOS text as output by Microsoft Word, Word Perfect, and/or
other word processors by default.
2. flexible (guidelines, not rigid requirements)
This would preclude the "doctrinaire" positions Lou Burnard adopted
in his responses to these issues, and would preclude ANY positions,
no matter taken by whom, which would be inflexible. It also should
preclude enforcement of the guidelines, else they transgress to the
"rigid requirements" prohibited above. Of course, this precludes a
guideline for the inclusion of normal text (i.e. texts which can be
searched with the search and find functions of "normal" programs on
"normal" machines)(more about this later, but please accurate quote
efforts please. I am tired of this point being restated, restated,
restated, to mean everything but what it means.)
3. device- and software-independent
This means the arguments that any serious resercher should have one
of the machines necessary for SGMLing are contrary to the proposal.
This means the arguments that any serious resercher should have one
of the programs necessary for SGMLing are contrary to the proposal.
Device independent means the files should be utilizable on various,
as widely varying as is feasible, machines. Any effort limiting to
certain hardware configurations is to be eschewed. Programs, also.
Software independent means the files should be utilizable on major,
and even semi-major software methodologies. Any effort limiting to
certain software configurations is to be eschewed to at the same or
greater degrees. However, these two were linked, not just by their
inclusion together in the NEH proposal, but before that they linked
in the natural evolution of hardware and software. Perhaps a great
majority of the use of etexts is to search the text for quotations,
or more loosely for portions of text which include certain words in
certain contexts. These contexts are usually defined in terms of a
proximity within a certain range of characters, words, lines or any
other definitions within the realms of the user and the program. A
text which has been marked up has great value, a value which I must
not be said ever to deny, but there were other values, values which
I pointed out could not be utilized by any of the normal programs I
see in use on normal machines. To use the search or find portions,
or sort features, or most of the more powerful features included in
most of the word processing and search programs, one must have text
which does not include markup. Therefore text could be released in
both marked up and not marked up formats. I prefer this to program
utilization by each user to strip the file, which would be a waster
of time and resources to do over and over again. Better yet, I am,
in association with others who are real programmers, working on the
production of a program with will present a text file in several of
the manners discussed above, in addition to being able to present a
single file, with multiple markups, as either a first edition, or a
second edition, etc. This will drastically reduce the space needed
to store multiple editions for comparison.
This part has become long enough. I don't want to lose attention,
attention required for the formulation of improved etext files.
You may wish to treat this second portion as a separate note: even
though it is in response to the same note as above.
>necessarily can be inspected with TYPE and similar programs. For this
>reason, I have found puzzling Michael Hart's recent request that the TEI
>guidelines require conforming texts to be manipulable with TYPE, GREP,
>CAT, etc., since, as has been repeatedly explained on this list, the
>character-set chapter already contains an explicit requirement which has
>that effect. I believe MH objects not to the (non-existent) non-ASCII
>characters, but to the markup itself, finding that the presence of
>angle-bracket-delimited text makes the file unusable. Since others
>find data unusable without the information in question, there appears
>to be no compromise on this point.
I am not asking for a compromise. I never have. I only ask that the
users of normal programs on normal computers not be denied access for
their program features to be used in conjunction with these etexts.
I have not objected to anything, only requested that something in the
way of increased utilization be included. When a users do a "search"
or "find" or "sort" or any of the various other features available in
most of the text and word processing programs, the markups can get in
the way and prevent even some of the simplest quotations from being a
result of the search. I am sure you all have been made aware of this
situation in a variety of experiences when a search did not yield the
quotation you could already see in front of you on the screen. Would
these situations not be reduced by allowing various and sundry others
in the world of search software to have a go at it?
Why would you want to limit the utilization of electronic texts? Why
would you want to keep them away from the millions of students with a
normal access to normal computers with normal programs?
This is not the "flexible"
"device- and software-independent"
"suitable for interchange of already-existing texts."
> *****remainder of original note deleted*****
Thank you for your interest,
Michael S. Hart, Director, Project Gutenberg
INTERNET: [hidden email]
BITNET: [hidden email]
The views expressed herein do not necessarily reflect
the views of any person or institution. Neither Prof
Hart nor Project Gutenberg have any official contacts
with the University of Illinois.
"NOTICE: Due to the shortage of ROBOTS and COMPUTERS
some of our workers are HUMAN and therefore will act
unpredictably when abused."
|Free forum by Nabble||Edit this page|