text editors, stripping, etc.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

text editors, stripping, etc.

Christopher Currie
     My requests for inf. on markup-stripping text editors and on
automatic SGML markup have proved fruitful. To sum up what seems
to emerge from the correspondence so far:

1. It is impossible to perform these functions quickly and easily on widely-
used microcomputer word processors, as used by the majority of
academics or others currently writing books or editing texts
(where they are using computers at all). Remember, in the U.K.
most people working in the humanities can't even afford a PC:
they get by with an Amstrad PCW. The situation must be worse in
e.g. India.

2. This restricts effective use of SGML markup at present, and
reception of SGML-marked texts, to the elite who have access to more
powerful systems, and the know-how and skill to use them. People
who already have texts prepared by a word processor aren't going
to be willing to spend months inserting SGML markup by hand; nor
will those who want human-readable texts be willing to accept
them in marked-up form, as Michael Hart points out.


3. Even on UNIX etc. systems, it seems doubtful whether the text
editors are up to the straightforward tasks I mentioned.  (To the
average academic user, it will seem that the tasks *ought* to be
straightforward, even if from a programming viewpoint they
aren't). The optimistic suggestion of a simple strip-out has been
refuted. How many of the people who are interested in *texts*,
not computer programming, will be up to writing an Emacs macro?

           I suspect that the only
effective way is a brute-force translation-table system, on the
lines of that drafted by Richard Goerwitz. (And why hadn't Icon
been mentioned by the experts before? This sort of thing is
what it's for, surely). For two or three years now I have been
using a general-purpose translation-table utility, running
 on a PC, for converting WP text, which we can't expect the
authors to mark up for themselves, to generically-coded
input for a variety of typesetting systems (and now DTP
software). The SGML conversion is similar in principle,
 though a good deal more complex. I had to write the software
myself as we couldn't find anything that would do the job
on equipment that a small department could afford. Richard
Goerwitz's program is much more elegant. What's needed now
are draft translation tables which users can modify to their
special requirements. Even then, users will have to be taught
what strings to include to represent the 'hidden' wordprocessor
codes, which they are not aware of in WYSIWYG systems.

     It's essential that the standard be easy to implement, even
if it's flexible enough to cover a wide range of conceivable
requirements. Otherwise people simply won't use it, any more
than, for example, they follow standards on citation of
documents by bibliographical references. Nor will most academics
be willing to use a second text editor simply to handle SGML.
They will want those which they are alreadly using to be upgraded
to be compatible.


Christopher

Reply | Threaded
Open this post in threaded view
|

Re: text editors, stripping, etc.

Erik Naggum-2
Christopher,

I don't understand your pessimism at all.  I particularly don't under-
stand what the lack of PC's in India has to do with this.  We can't
start doing heavy-duty computing with complaining about lack of PC's in
India, anyhow.  Nothing good comes of complaining how bad the possibly
worst conditions might be.

I suggested that we sit down and figure out how certain things should
look on a display unit with few, if any, modes of graphical rendition.
Anders Thulin and others have brought forth the same idea, and every
time it looks as if they are the first to do so, because each time, the
idea is nearly drowned in "but it won't fit in/run on/etc my matchbox
computer".  Who cares?  First we make it work, then we make it work on
lots of machines.

If it is not possible to make people accept that they won't get SGML
software on their tiny portable Z80-based CP/M machines with a 14K
floppy disk and nineteen bits of memory, how about using the
minimization features of SGML to the extreme, where we accept a certain
code to mean "<keyword>".  I mean, SGML is *powerful*.

Opinion:  You won't have any use for SGML on your Amstrad PCW or
Matchbox PC-2000, anyway, so why bother?

> 3. Even on UNIX etc. systems, it seems doubtful whether the text
> editors are up to the straightforward tasks I mentioned.  (To the
> average academic user, it will seem that the tasks *ought* to be
> straightforward, even if from a programming viewpoint they
> aren't). The optimistic suggestion of a simple strip-out has been
> refuted. How many of the people who are interested in *texts*,
> not computer programming, will be up to writing an Emacs macro?

I wonder, do you think SGML is the worst thing to happen to texts?
I'm interested in both texts and computer programming.  I may be able to
do some of the things that people who are interested in texts, only,
would like to see happen.  I see SGML as containing _information_, that
should _not_ be stripped away.  You can represent it in many different
ways, of which SGML is but one, but you don't strip it out.

I just don't get the main thrust of your suggestions.  Some counter-
suggestions to your "strip it" position:  We can make an empty line
mean paragraph start.  We can make "_word_" mean "<emph>word</emph>".
We can make ">>" mean "<head>" and make the "</head>" omissible in the
DTD.  So you would have:

        >> Easy Markup with SGML

        An easy way to achieve widespread use of SGML _now_ is to let
        lots of people use existing software for the task of entering
        text.  The typewriter model has been used in the SGML document,
        for example.

        The already widespread use of word processors may allow us to
        increase the expectations of what software can do, provided
        that sufficiently powerful ideas about text management can make
        it to the producers of the widespead software.

to mean:

        <head>Easy Markup with SGML</head>
        <p>An easy way to achieve widespread use of SGML <emph>now</emph>
        is to let lots of people ... </p>
        <p>The already widespread use of word ...</p>

The former is undoubtedly easier to read, unless you have lots of
training in reading text marked up with SGML.

Finally, I think we're discussing a non-problem.  Let's pick up on the
important things, and get something which will make people see, by
themselves, that it's smarter to use generalized markup than their
existing specific things.  I did it with a bunch of journalists in a
financial newspaper, and they are very delighted with the results.
They produce text both for fax transmission, electronic news to brokers
and others, _and_ newspaper columns, and now they can enter the text
only once.  Major incentive.

What do we have to offer people?  What can we make TEI mean to people
who use texts in more diverse ways than read them, or would do if it was
available?

We must _not_ let ourselves be entrenched in existing technology.
It's not existing technology we're dealing with, in the first place.

[Erik Naggum]
Naggum Software, Oslo, Norway