Having just returned from being out of town to the most recent TEI-L
discussion, I have found, figuratively, the bleeding body of Bryan Cholfin's note, shot by Bob Amsler in his reply. Fortunately, e-bullets can be extracted and rarely do bodily harm; this is the more fortunate, in that I think RA may have shot prematurely, and that BC's and RA's views are not in fact in conflict. The purpose of the TEI, according to the funding proposal we sent to NEH last year, is the development and dissemination of guidelines for the preparation and interchange of machine-readable texts; these guidelines must be (again, paraphrasing the proposal): suitable for interchange of already-existing texts suitable for guidance in encoding new texts flexible (guidelines, not rigid requirements) extensible device- and software-independent language-independent application-independent For my part, I think Bryan Cholfin is wholly correct to distinguish rigorously between ASCII-only text and markup-free text. Normal TEI texts are limited to a subset of the ASCII characters (a subset shared by most machines and character sets, even those in non-Anglophone countries and those outside the ISO world, like EBCDIC) and thus necessarily can be inspected with TYPE and similar programs. For this reason, I have found puzzling Michael Hart's recent request that the TEI guidelines require conforming texts to be manipulable with TYPE, GREP, CAT, etc., since, as has been repeatedly explained on this list, the character-set chapter already contains an explicit requirement which has that effect. I believe MH objects not to the (non-existent) non-ASCII characters, but to the markup itself, finding that the presence of angle-bracket-delimited text makes the file unusable. Since others find data unusable without the information in question, there appears to be no compromise on this point. I'm not sure exactly what BC means when he says Now, -some- standard file format -does- need to be defined, so that application programmers can build those filters into their software, and be able to make the software output files that other SGML-senstive software can make use of. I believe the information required by application programmers is contained in the definition of SGML (ISO 8879) and in the formal document type declarations in the appendix of the TEI guidelines. Further specification (e.g. restrictions as to line length or requirements as to disposition of white space in the file) would seem to me to be unnecessary and pernicious, since it has nothing to do with the application-independent information in the file. (The difference between this position and that propounded by RA in his note is, I believe, merely one of degree: I lay more stress on SGML's specification of what the SGML data stream looks like, and he much less. In an absolute sense, he is right: the SGML standard explicitly states that the physical form of the input stream is not restricted, though at least part of it must conform to ISO 646 or ASCII, and the rest has to be electronic character data if most of the standard is to be interpretable.) To the comments of Willard McCarty and others on the desirability of some common, predefined specification of a standard specification of text encoded with TEI tags, all I can say is 'yes, that would clearly make TEI texts more usable, and thus would help ensure that the TEI scheme is widely adopted' and urge them to lay pen to paper or finger to keyboard to develop such a specification, and then share it with us here. There are several reasons I think such a presentation-specification should not come from the TEI itself: it's too close to software development (for which we lack the resources and mandate), it's a specification of one user interface (and the TEI would risk having the one interface confused with the underlying encoding, and having people decline to use the TEI scheme 'because it puts two blank lines after the section title and I hate that', and it would distract our slender resources from the crucial and difficult specification of the tag set and structure into an enterprise which, however useful, does not pose any intrinsic conceptual difficulties. Presentation as a function of markup has been specified for a long time by formatters and structured editors, and the problem of how to specify the desired presentation has been solved by troff, Waterloo and IBM GML, Author/Editor, CheckMark, Nota Bene, any word processor with style sheets, DynaText, and many others. It should not be beyond the wit of some subscriber to this list to implement the TEI scheme or some significant subset thereof in one or the other of these programs. We would welcome such enterprise, and I can assure you there will be space on the server for whatever you develop and wish to share (even if I have to learn UUENCODE and UUDECODE to handle it!). If no volunteers are found willing, perhaps that will be a sign that the community is not as interested in the problem as one might expect. Any volunteers who run into interpretive problems understanding the guidelines are hereby assured that any inquiries they have will be answered. -Michael Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago |
Michael S. Hart's response to:
"The Shot Heard 'Round the World" This is is two parts, one referring to the NEH proposal (a copy should be on the way for me to quote in more detail), the other in response to S-Ms restatement of my proposal (the latest in a series of TEI-L restatements, which I would much prefer to be quotations. There might be a reason.) On Tue, 20 Nov 90 15:04:13 CST Michael Sperberg-McQueen 312 996-2477 -2981 said: >Having just returned from being out of town to the most recent TEI-L >discussion, I have found, figuratively, the bleeding body of Bryan >Cholfin's note, shot by Bob Amsler in his reply. Fortunately, e-bullets >can be extracted and rarely do bodily harm; this is the more fortunate, >in that I think RA may have shot prematurely, and that BC's and RA's >views are not in fact in conflict. > >The purpose of the TEI, according to the funding proposal we sent to NEH >last year, is the development and dissemination of guidelines for the >preparation and interchange of machine-readable texts; these guidelines >must be (again, paraphrasing the proposal): > > suitable for interchange of already-existing texts > suitable for guidance in encoding new texts > flexible (guidelines, not rigid requirements) > extensible > device- and software-independent > language-independent > application-independent > I would like to respond to these in order. 1. suitable for interchange of already-existing texts Obviously before the advent of TEI, TEI-L, SGML, etc, there were no etexts encoded in these formats, with these markups or whathaveyou. Therefore such interchange would have to include etexts which could not be marked up according to such standards. These etexts include standard DOS text as output by Microsoft Word, Word Perfect, and/or other word processors by default. 2. flexible (guidelines, not rigid requirements) This would preclude the "doctrinaire" positions Lou Burnard adopted in his responses to these issues, and would preclude ANY positions, no matter taken by whom, which would be inflexible. It also should preclude enforcement of the guidelines, else they transgress to the "rigid requirements" prohibited above. Of course, this precludes a guideline for the inclusion of normal text (i.e. texts which can be searched with the search and find functions of "normal" programs on "normal" machines)(more about this later, but please accurate quote efforts please. I am tired of this point being restated, restated, restated, to mean everything but what it means.) 3. device- and software-independent This means the arguments that any serious resercher should have one of the machines necessary for SGMLing are contrary to the proposal. This means the arguments that any serious resercher should have one of the programs necessary for SGMLing are contrary to the proposal. Device independent means the files should be utilizable on various, as widely varying as is feasible, machines. Any effort limiting to certain hardware configurations is to be eschewed. Programs, also. Software independent means the files should be utilizable on major, and even semi-major software methodologies. Any effort limiting to certain software configurations is to be eschewed to at the same or greater degrees. However, these two were linked, not just by their inclusion together in the NEH proposal, but before that they linked in the natural evolution of hardware and software. Perhaps a great majority of the use of etexts is to search the text for quotations, or more loosely for portions of text which include certain words in certain contexts. These contexts are usually defined in terms of a proximity within a certain range of characters, words, lines or any other definitions within the realms of the user and the program. A text which has been marked up has great value, a value which I must not be said ever to deny, but there were other values, values which I pointed out could not be utilized by any of the normal programs I see in use on normal machines. To use the search or find portions, or sort features, or most of the more powerful features included in most of the word processing and search programs, one must have text which does not include markup. Therefore text could be released in both marked up and not marked up formats. I prefer this to program utilization by each user to strip the file, which would be a waster of time and resources to do over and over again. Better yet, I am, in association with others who are real programmers, working on the production of a program with will present a text file in several of the manners discussed above, in addition to being able to present a single file, with multiple markups, as either a first edition, or a second edition, etc. This will drastically reduce the space needed to store multiple editions for comparison. This part has become long enough. I don't want to lose attention, attention required for the formulation of improved etext files. You may wish to treat this second portion as a separate note: even though it is in response to the same note as above. Lines deleted***** >necessarily can be inspected with TYPE and similar programs. For this >reason, I have found puzzling Michael Hart's recent request that the TEI >guidelines require conforming texts to be manipulable with TYPE, GREP, >CAT, etc., since, as has been repeatedly explained on this list, the >character-set chapter already contains an explicit requirement which has >that effect. I believe MH objects not to the (non-existent) non-ASCII >characters, but to the markup itself, finding that the presence of >angle-bracket-delimited text makes the file unusable. Since others >find data unusable without the information in question, there appears >to be no compromise on this point. I am not asking for a compromise. I never have. I only ask that the users of normal programs on normal computers not be denied access for their program features to be used in conjunction with these etexts. I have not objected to anything, only requested that something in the way of increased utilization be included. When a users do a "search" or "find" or "sort" or any of the various other features available in most of the text and word processing programs, the markups can get in the way and prevent even some of the simplest quotations from being a result of the search. I am sure you all have been made aware of this situation in a variety of experiences when a search did not yield the quotation you could already see in front of you on the screen. Would these situations not be reduced by allowing various and sundry others in the world of search software to have a go at it? Why would you want to limit the utilization of electronic texts? Why would you want to keep them away from the millions of students with a normal access to normal computers with normal programs? This is not the "flexible" "device- and software-independent" "proposal" "suitable for interchange of already-existing texts." > *****remainder of original note deleted***** Thank you for your interest, Michael S. Hart, Director, Project Gutenberg INTERNET: [hidden email] BITNET: [hidden email] The views expressed herein do not necessarily reflect the views of any person or institution. Neither Prof Hart nor Project Gutenberg have any official contacts with the University of Illinois. "NOTICE: Due to the shortage of ROBOTS and COMPUTERS some of our workers are HUMAN and therefore will act unpredictably when abused." |
Free forum by Nabble | Edit this page |