Re. SGML browsers

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Re. SGML browsers

Steven J. DeRose
I appreciate Lee Quin's post re. SGML browsing software,
and especially his caveat that
>you must take what I say with a pinch of salt.

Similar caveat:
 I am the Senior System Architect for EBT, use salt as desired.

I feel I should respond to Lee's posting since I am more familiar
with our products (such as DynaText) and their features.

--------------- On DynaText(tm):

>DynaText is a `cheap and cheerful' SGML browser.

I don't quite know what "cheerful" means for a software product; "cheap"
wouldn't have been my first choice adjective, although our academic discount
gets you the entire software system free except for the normal yearly
maintenance and support charge. This seems a pretty good break to me.

Searching is not actually limited in the ways described --
you can have very long phrase searches if desired, use regular expressions,
control stop/go lists, search for structures of attributes and elements,
as well as search for punctuation, words with or without diacritics, etc.
Readers can make annotations (which Lee mentions), but can also make
hyperlinks (which he mentions only as a SoftQuad feature).

For graphics, we support a large range of formats, either displayed inline
or accessed via links. Likewise for a wide variety of table DTDs.
An unusual feature of DynaText is that you can format non-tabular markup
as tables when needed. We also support TEX equations embedded in your SGML
documents, and all the major SGML DTDs for equations.

>The main limit of DynaText is that it uses the `Electronic Book' model --
>you `bind' the pages in a single process, so you can't unindex, or change
>documents later, or add documents.

The use of the word "pages" could easily be misunderstood here. We do not
save formatted pages, in the manner, say, of Acrobat(tm). Indeed, I think it
fair to say EBT has been among first and strongest advocates of moving from
fixed pages to a completely dynamic online presentation. You can indeed
"unindex", if that means get your SGML back out. You can also replace or add
documents any time you want (and reader annotations can be automatically
reattached across revisions).

The "bind" issue is more interesting. The only way to view SGML without
a binding process is (by definition) to re-parse each document each time
it is opened for viewing. This is a nice feature. But since SGML itself is
defined in a way that precludes correct parsing if you start in the middle,
you have to parse the *whole* document. So if you want to see the last
part of a 2 MB book, you must wait while the whole thing is read from
your disk or CD to be processed. In the same way, if you want to search
for something and no "binding" has been done, you wait for a serial search
(which requires processing the whole document up to the next hit). These
operations are intrinsically bounded by disk speed and document size.

EBT avoids any such delays by a "make-book" process that builds structural
and full-text indexes. However, most crucially, we do not discard the
SGML information. This means you can search with reference to all the
structural information of your original documents, and you can export any
portion of your "bound" book back to the same SGML structure. This, by the
way, is a major shortcoming of any browser that enforces a particular DTD:
you could of course convert your SGML to HTML or to anything, but then the
hard-earned structural information would be gone (rather defeating the purpose).

>Also -- and this is probably a major irritation for TEI users -- all of
>the documents in a book have to share the same DTD.

EBT's software imposes only SGML's own requirement: A single SGML document
instance must conform to a single DTD -- perhaps not a "major irritation"
for most TEI users. There is no requirement that documents in a collection
share a single DTD. Nevertheless, you can do full-text searches across all
documents, and even set up mappings to treat different element type names
in different documents as synonymous for searching.

------------------- On Explorer:

>We (SoftQuad) have recently announced -- and started shipping -- the
>SoftQuad Explorer.  This is a `next generation' browser that concentrates
>on taking advantage of the SGMl structure.  You can classify documents in
>multiple ways, and browse a graphical tree representation of the `thesaurus'
>that you've built; there's a full text index (perhaps not as powerful as the
>one in DynaText at this point, though), and you can make annotations.

I'm reluctant to consider Explorer "next generation" from us except perhaps
in the genetic or temporal sense. I think DynaText does "take advantage
of the SGMl structure" quite thoroughly and effectively (I would certainly be
interested in hearing from the readers of the list about any limitations they
have encountered). Certainly our design of DynaText has concentrated heavily on
just these issues of structured documents, SGML, and textual research concerns.

>In addition, you can make hypertext links, and you can do so in your own
>`web', so that individual researchers can share the same text and compare
>notes -- you can have multiple webs open at once.

All this seems to me to be true of DynaText as well.

>Well, OK, I'm biassed, but I think Explorer is a lot more interesting than
>DynaText if you want to explore texts or do research -- and that's what it
>was designed for.  I know that there are some Explorer (originally called
>SGML Darc) users on this list, so I'll try not to be too enthusiastic in
>case they disagree :-)

Obviously this paragraph is an opinion, and one with which I disagree.

It does bring to mind, however, the very interesting question of just
what specific feature(s) readers of this list consider particularly useful to
textual scholars, and how particular products support them. This seems
relevant for TEI-L to discuss. I'll make a small start: it seems to me
that one major need of textual scholars is fast response to search, navigation,
and display requests over large bodies of data -- for which some kind of
preprocessing seems to be the only practical solution, and for which the
indexed searching must apply intra- as well as inter-book. Another need is
for tools that can deal quickly and very flexibly with complicated structures,
And a third is the ability to display variant versions of a text on demand.
These are all things customers have done successfully with DynaText for some


A few other DynaText advantages I cannot resist mentioning in closing:
   * Localization: even for Japanese documents and interfaces.
   * Platform support and book portability: Mac, Windows, many Unixes.
   * Scalability: For example, the rather large English Poetry Database.
   * Customizability: an extensive C toolkit, that you can see in use
     on all Novell NetWare 3.12 and 4.0(tm) and SGI Unix systems.
   * Tool integration: See van Herwijnen's online *SGML Tutorial*.
   * Built-in support for TEI constructs, such as extended pointers.

Steven J. DeRose
Sr. System Architect
Electronic Book Technologies
One Richmond Square
Providence, RI 02906 USA
(401) 421-9550, fax -9551
[hidden email]