ASCII e-text

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view

ASCII e-text

Bryan Cholfin
If I've been following the argument about text-file formats correctly,
(and you can shoot me if I haven't), there's actually two distinct (well,
semi-distinct) issues. The question of whether SGML-encoded files should
be distributed in ASCII format or not is relevant to the idea that the
TEI project is supposed to create a device and application independent
coding scheme (I don't have the TEI guidelines yet, so I may have the
details wrong).
In order to achieve the goal of device-independent text interchange, files
should be distributed in a form that (ideally, at least) any system could
read. So far, plain ASCII text files come closest to that (as far as I know).
Most word processors, editors and system utilities (like DOS TYPE) can input
ASCII files, even if they use a different format for their own work. But since
SGML is a coding scheme, even if the codes are ASCII, the particular
application is not going to be able to make any use of the information
encoded unless it has a filter/interpreter which understands the code.
So on a practical level, the only software or devices that really need to be
able to read SGML files are those that have the appropriate filters.
Now, -some- standard file format -does- need to be defined, so that
application programmers can build those filters into their software,
and be able to make the software output files that other SGML-senstive
software can make use of.

Reply | Threaded
Open this post in threaded view

Re: ASCII e-text

Robert A Amsler-2

The TEI's goal isn't device-independent text interchange, it is
presentation-independent content representation (for text interchange).
`Devices' are concerned with tasks like `printing', which while
a desirable capability for text--is not at all a necessary
precondition for TEI text. It doesn't matter whether anyone knows
how to print something--it is a question of whether they know
what the information items in the text signify. Likewise, TEI
text might just as well be described as database-independent or
even application-independent (though I suppose that is a BIT strong
as there is no telling what your application might be for text).

ASCII is afterall only an alphabet. Saying a text is in ASCII
is saying little more than that it uses alphanumerics and some
punctuation. The TEI, for example, doesn't assume anything about the
control characters--the ``unprintable'' characters. Even carriage-return
is optional. In some sense the TEI and its standards go well beyond
ASCII to assume only a printable subset of ASCII.

SGML really doesn't care about `some file format' as it doesn't deal
with physical things at all---of course, there is no such thing as an
abstract magnetic medium and it matters when you render text
machine-readable how and on what you enter it. However here the TEI
doesn't intend to tell you how to render it machine-readable since
the TEI doesn't intend to actually create any text--only the
standards for the abstract (ASCII-subset) representation of the
content in the text.

The TEI is really just like the style guides you buy to help you
write documents that conform to good writing practices. Style guides
don't tell you whether to use a word processor or a typewriter or
even a pencil and paper. They only address things like how to
represent the name of a musical note in text, what the abbreviation
for `und so weiter' is, what the difference between a figure and a
table are, how to denote the elements in a two-level index entry,