I have heard at least two arguments against the SGML entities:
(1) they expand texts in a prohibitive way; (2) they are difficult to read. To test the validity of the arguments, I just re-coded Maupassant's Menuet (French), and I thought that you might be interested in the results: Type of coding # chars expansion --------------------------------------- --------- ----------- Original text with accents coded in Macintosh set...........................9169............ Text with SGML with accents coded with SGML-entities........................10593......115.5% Text with accents coded a` la TeX (e grave = \`e , etc.).....................9585......104.5% I tried the second one because many people working on French use some kind of home-made cooking of this kind. It seems to be the most compact ISO 646 representation one can find without too many ambiguities to solve. The difference between this encoding and the supposedly very wasteful SGML entity-coding is not very big. Nothing like multiplying the size of the text by three or four. Therefore the first arguments doesn't hold (for French). As far as the second argument is concerned, I have of course heard many times the counter-argument that this type of encoding is not intended to be read by humans, but should just serve the purpose of transmission. Unfortunately, most people I know who work on French deal with these things at one time or another, simply because nobody has yet the software to do all the necessary conversion. This speaks strongly for the development of public domain software to perform these tasks--I have the feeling that the success of the TEI depends in large part of the availaibility of such software for free, or cheap. Anyway, just for a test, here are the SGML and TeX-like versions of the same fragment. J' ai cinquante ans. J' étais jeune alors et j' étudiais le droit. Un peu triste, un peu rêveur, imprégné d' une philosophie mélancolique, je n' aimais guère les cafés bruyants, les camarades braillards, ni les filles stupides. Je me levais tôt; et une de mes plus chères voluptés était de me promener seul, vers huit heures du matin, dans la pépinière du Luxembourg. J' ai cinquante ans. J' \'etais jeune alors et j' \'etudiais le droit. Un peu triste, un peu r\^eveur, impr\'egn\'e d' une philosophie m\'elancolique, je n' aimais gu\`ere les caf\'es bruyants, les camarades braillards, ni les filles stupides. Je me levais t\^ot; et une de mes plus ch\`eres volupt\'es \'etait de me promener seul, vers huit heures du matin, dans la p\'epini\`ere du Luxembourg. The second one is probably easier to read, but not really wonderful either. |
I completely agree. The space requirements for keeping text in SGML
compared to say, producing PostScript output are negligible. |
Free forum by Nabble | Edit this page |