[ The following exchange between Professor Stig Johansson of Oslo
University and Dr Robert Amsler of Bellcore may be of wider interest, and I have therefore taken the liberty of posting it to this list, without asking permission of either of them. Comments, reactions, even flames, welcomed. Lou Burnard, Assoc Ed, TEI ] ==== Date: Wed, 11 Jul 90 13:01:21 +0100 From: Stig Johansson <h_johansson%[hidden email]> Subject: How far SGML? Message-Id: <805*[hidden email]> Does everything have to be expressed through SGML? Is it good for all types of texts? Cf the recent book by Herwijnen (1990: 12): 'A tree representation can be used to describe many documents, but not all.' Even if SGML can be made to do the work, is it the best way for all texts and all types of textual features? For example, can it be used for the discourse transcription needed in a spoken corpus without cluttering up the text hopelessly? Why not try to convert a discourse transcription to SGML-type tagging? You will say - everything does not need to be keyboarded - there are possibilities of tag minimization - tags can be suppressed, etc. But we do want to be able to see the text simultaneously tagged for speaker overlap. stress, intonation, etc. You will say - but SGML is for the interchange format - it can be converted to/from a local format. The best thing would be, however, if we could have a usable interchange format or, at least, if we could minimize the difference between the interchange format and local formats. Otherwise, who will bother to use the interchange format? What I am really asking is: what harm does it do if we can find some simpler way of expressing, for example, intonation or word class? Finally, are we really sure that we can define in advance a grammar for textual features? This is what the DTD does (for a particular type of text). These are some of the questions that need careful consideration. Or am I just being stupid? Oslo 25 June 1990 Stig Johansson ============= From: Robert A Amsler <[hidden email]> Subject: Re: 3d note from SJ Re: Stig's concerns over whether SGML should always be used to represent all texts. There has been one category of material which from the beginning I have concluded should not be represented in SGML. From this category I believe one can abstract a general principle of what types of material are not best served in SGML garb. The category was programming language code. The reason was that since a compiler existed for such code, it was not only unnecessary to convert such material into SGML, it was outright harmful. Since then, I've come to conclude that DBMS loader stream information likewise should not be converted into SGML tagging. The reasoning is similar, that it is a formal language already, which is checked by its own software. In the case of database data, however, it was concluded that there might be another form of the information from which the database loader stream could be derived, and that this other format might itself be an SGML format. The general principle seems to be that what makes using SGML worthwhile is that software exists to make use of the format and software exists to validate the content of the format. Whenever other software exists for another format, the utility of SGML is lessened to the degree that this other software provides all the formal computational capabilities that SGML provides. However, there is another principle which should be stated. It is that most criteria which concern `readability' or `legibility' for human users of text are in fact derivable by software from less readable and less legible forms of information which are more useful to computers. Using a computer is a bit of a Faustian deal. We gain capabilities to do things, but lose a bit of the ability actually to understand the raw information from which these capabilities derive. ============================== |
Free forum by Nabble | Edit this page |