Size of OED2

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Size of OED2

Michael S. Hart-2
Hate to contradict such a `reasoned' argument, but I HAVE the
machine-readable OED2 and it is .5G in size. That is of course with
the SGML tags in it, and there is no effort to minimize them--but
then too, I can typeset the OED2 in its original format from this
data.

Following Michael's comments about wanting to see it on the screen
as it appears in the book--yes, that is possible too since it can
be typeset into display Postscript and using a Postscript previewer
I can look at the entries in their original form. Note, for me
original form means variable-width typefaces fully justified,
bold, italic and roman fonts at two point sizes. Anything less
as output display doesn't look like the book to me. Pure ASCII
certainly wouldn't be a reasonable replacement.

The point here is that I think there is room to differ as to
what appearance one considers to be `true' to the book's form.

And... just to be wicked about this, I also retypeset a page
of the OED2 into Color Postscript assigning different content
categories of entries to different colors, e.g. etymologies to
green, citations to yellow, cross-reference to red, headwords
to blue, etc. Simple translation of the content tags of SGML into
something more elaborate than their `faithful' renditions into
black-and-white bold, italic, and roman at two point sizes.
(Hummm... I wonder if this will get Hollywood (or Oxford) angry at me for
colorizing a classic...).

The point is that SGML is a scheme which carries the information as
to how the original work was put together, not a translation out of
that original capability into a shadow of its former self. From that
original information one can not only realize the original, but
realize it in future display systems impractical using existing
hardware. Imagine how hard it would be to go from a representation
which ONLY described the font sizes and font styles to one which did
something different for each content category. Then imagine how much
harder than THAT it would be if you didn't even have the font sizes
and styles.
*******Response from Michael S. Hart*****
First, the previous notes I read and wrote referred to the OED, not OED2,
at least by name (of course I know the subject about which _I_ wrote, but
to be perfectly honest, I think the note to which I responded meant OED2,
but the discussion about size and markup is equally relevant to both OED2
and OED.  Both documents are large, but much, much larger with markup.

I find that the descriptions of manipulation above are quite interesting,
but would border on copyright infringment of a sort, and definitely if an
edition was created and redistributed freely.  This is one of the reasons
I promote using Public Domain texts for etext creation.  Then, anyone can
use the text to create their own etexts, which they may or may not put in
copyright statues.  However as I have been informed by the US Copy Office
the OED1 is mostly Public Domain, A - Sh, and the rest will become Public
Domain shortly, unless the copyright laws are changed again, and the last
few letters are grandfathered in for an extended period, as in 1976, when
all documents still under any copyright protection were granted a 75 year
copyright, even if their renewed copyright was to have expired the day on
which the new law took effect.  This was quite a blow to those of us with
plans to put these documents into Public Domain etexts, especially when a
project was already underway to to so.  Thus we already have many etexts,
sadly to say, which cannot be distributed freely, because they were made,
erroneously, in anticipation of their entry into the Public Domain.

We promote universal accessibility to etexts, meaning that anyone should,
would, and could use them on any computer with any text program - without
having to render allegiance to any standards or copyrights.  I was not at
all aware that SGML wanted to become the only standard until this week, I
apologize for any miscommunication which may have transpired as a result.
I erroneously thought this was a group dedicated to the electronic coding
of text, i.e. the creation of etexts:  I did not realized this group that
called itself the Text Encoding Initiative meant to make theirs the only,
as I have been told, standard for etext.  As I understand it, SGMLing any
text, be it copyrighted or Public Domain, allowed the SGMLer to claim the
right of copy protection (now for life plus 50 years - so no one can know
when it will become Public Domain until the author dies, and then a fifty
year wait, during which the law could be changed again.)

Michael S. Hart