stripping markup etc.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

stripping markup etc.

Christopher Currie
An assumption behind Erik's argument is that software developed
on big systems will eventually trickle down to smaller ones. But
the situation where most people do not have access to the current
technology will continue, because the current technology is
always the most expensive. I doubt whether, for example, Unix
will be widely used on small personal micros, however much more
powerful they become than the present ones, except in a stripped-
down version with a user-friendly interface. By that time, with
Erik Naggum's approach, the things the experts will be doing
with the standard will again be beyond the capacity of the
majority of users, who will therefore be excluded from being able
to use the output effectively.

     Robin Cover makes pertinent remarks about the waste of effort
involved in conforming to details of punctuation in bibliographic
standards. But I meant more than that. I spent some years working
on the British Standards for citation of published and unpublished
documents by bibliographical references. We were not concerned to
prescribe details of punctuation, etc., although we gave examples
of what was thought to be good practice. We were concerned with
the generalized form of the reference and its elements. One
recommendation was that the name and date (Harvard) system is
usually unsuitable for citing unpublished documents (which is
obvious when you think about it). But journal editors who deal
mainly with papers citing only published work often insist on the
name and date system for citing unpublished documents as well, for
obvious reasons of their own convenience and house style. The
results are ghastly, but there's nothing authors can do about them.

     We also required a 'descriptive element' in the reference to an
unpublished document, so that the document could be recognized in
another copy or form. This requirement is widely ignored, because
such descriptions involve difficult choices and because the
descriptive element lengthens the reference so much as to affect
the cost of publishing books and papers significantly in many
cases. It also increases the labour for the author. Those points
were of course raised while the standard was being drafted, but
the chairman of the committee chose to ignore them, using
arguments very similar to Erik's.

     Adding detailed generalized markup to a document by hand, or
putting it in as the document is typed, is very labour-intensive
and offers considerable scope for error. That is why it's common
sense to seek to insert it as far as possible automatically and
consistently after the document has been typed. Early word
processors often required you to insert formatting commands as
special lines in the text (e.g. Wordstar dot commands, Scripsit
format lines, etc.) Users have moved away from this approach to
more WYSIWYG systems and in some cases to tagging through a
graphical interface.  Such editors on micros  are much more
user-friendly than mainframe text editors, however 'powerful' the
latter may be. Users won't want to move back unless we make it
easy for them.

     Erik supposes that you want to leave the SGML markup in the
document in all cases, because you lose information if you take
it out. Surely in fact you want to leave it in one copy which you
can then manipulate with suitable software, but to strip it out
(or replace it by appropriate formatting) for a human-readable
copy, which you may want to print out, show to a colleague,
publish as a book, etc. (In the last case it is the typesetting
system that needs to convert and/or strip out the markup).

     I conclude that the software for making the actual job
of encoding or decoding text as simple as possible needs to
be developed, or at least designed in general terms,
*concurrently* with the encoding standard, should run on as many
computers and operating systems as possible, should be cheap
and easy to use, and should be capable of incorporation into
mainstream word processors. The software requirements need to be
borne in mind at each stage of defining the standard. To define a
standard first, and then leave users or commercial companies to
work out how to implement it conveniently, is to ensure that it
won't be as widely used as it should be.

Christopher

Reply | Threaded
Open this post in threaded view
|

Re: stripping markup etc.

Robert A Amsler-2
Christopher Currie writes,

    ``To design a standard first, and  then leave  users or commercial
    companies  to  work out  how to  implement it  conveniently, is to
    ensure that it won't be as widely used as it should be.''

Yet--this is precisely  how MOST  standards are  designed.   The reason is
that  no  commercial  software  developer is  going to  build software for
something that isn't ALREADY a standard.  If they  did they  would have to
continually revise it as changes in the drafts of the  standard were being
made.  It would also be unmarketable UNTIL the standard was approved....

I.e., would YOU  buy software  for a  ``proposed'' standard  that might be
changed perhaps so completely that your software wouldn't be useful?
Probably ONLY if it were being proposed by a hardware manufacturer---
but even then it isn't safe (Remember Betamax!).

It took a decade for SGML to be developed and publicized. The ONLY
reason you are considering software now is that it has been a standard
for a few years and people are beginning to use it. The software
potential is considerable since SGML is a higher standard than
all existing text formatting non-standards (that is, it can be
translated down into any of them, but they cannot be translated up into
SGML without human reconsideration (e.g. why `italics'?)).
SGML also seems capable of conversion into database formats and
directly usable in text retrieval tasks.

Reply | Threaded
Open this post in threaded view
|

Re: stripping markup etc.

Erik Naggum-2
In reply to this post by Christopher Currie
> An assumption behind Erik's argument is that software developed
> on big systems will eventually trickle down to smaller ones.

That is my assumption.  It has also happened with a fairly formidable
lot of software.  Several large mainframe based packages are available
today, in the latest version, on smaller systems, such as PC's.  SPSSx,
MINITAB, Scribe, TeX, are instances that I have used myself.  The ideas
behind these products also pervade the business end of the PC software
market.  Granted, it has taken a few years, like ten or so.  "Such is
life when you work revolutions."

> I doubt whether, for example, Unix
> will be widely used on small personal micros, however much more
> powerful they become than the present ones, except in a stripped-
> down version with a user-friendly interface.

This is the trend today.  See BYTE magazine, for instance.  SCO and
Interactive sell Unix'es for PC's and they're getting a lot of
attention.  OS/2 is considered to be kept alive artificially, since a
lot of people already have DOS, and the functionality sported by OS/2
can most often be found in Unix.  And besides, "small micros" today
typically have more memory and disk than an average mainframe 5 years
ago.  This means that as technology makes it possible to mass-produce
these "huge small micros", we will have the power of the mainframe at a
very tiny fraction of the cost.  The software will then be available on
the small machines, because "small" was redefined in the process.

> By that time, with
> Erik Naggum's approach, the things the experts will be doing
> with the standard will again be beyond the capacity of the
> majority of users, who will therefore be excluded from being able
> to use the output effectively.

This has not been the case with software systems in the most recent
10 years, and particularly not in the past 5 years.  History counters
your claim.

>      Erik supposes that you want to leave the SGML markup in the
> document in all cases, because you lose information if you take
> it out. Surely in fact you want to leave it in one copy which you
> can then manipulate with suitable software, but to strip it out
> (or replace it by appropriate formatting) for a human-readable
> copy, which you may want to print out, show to a colleague,
> publish as a book, etc. (In the last case it is the typesetting
> system that needs to convert and/or strip out the markup).

You equivocate over "strip".  By "strip" I understand "remove, without
doing much to replace the functionality thereof."  I do not consider
"stip" to be even similar to "processing".  That would be like saying
that you "strip" the programming language around the constant text
strings when you compile the program, so you can see the strings on the
terminal.

I don't think anybody would believe that I suggest that we should
publish books with the SGML tags intact.  If you do so, please send
me private mail, so we can get that cleared up.

>      I conclude that the software for making the actual job
> of encoding or decoding text as simple as possible needs to
> be developed, or at least designed in general terms,
> *concurrently* with the encoding standard, should run on as many
> computers and operating systems as possible, should be cheap
> and easy to use, and should be capable of incorporation into
> mainstream word processors. The software requirements need to be
> borne in mind at each stage of defining the standard. To define a
> standard first, and then leave users or commercial companies to
> work out how to implement it conveniently, is to ensure that it
> won't be as widely used as it should be.

That was the way TCP/IP was developed.  Same with OSI.  Most ISO
standards are developed by industry representatives.  Why should
TEI be different?  As I said: "First me make it work, then we make it
work on lots of machines."  Anything else will not make it cheap.

How widely used "should" a standard be used?  Let people decide, by
showing them what can be done, and then make them want it.  A horribly
complex thing such as X.400 or EDI has worked this way.  It takes a
little effort to make it happen, though, or a very good idea.  I think
we have a very good idea.  Nonetheless, people got to know about it.

I rest my case.  Personal comments are invited, as I think this list
should go on to discuss other things, because Christopher's posts don't
vary much from the previous ones, as is the case with mine in this
debate.  I challenge you all to come up with a better way to spread
software than to make it work well on some platform, and let people see
what it can do, and then demand it for their own platform.  That _was_
the way several large-scale software packages found their way to PC's,
incidentally.

To quote a friend in the Ada software business:  "The best way to
predict the future is to invent it."  (Ed Berard)

[Erik Naggum]
Naggum Software, Oslo, Norway