Info on automated tagging: TEI

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Info on automated tagging: TEI

Gloria McMillan
Hi!

        I was just talking to Cindy Dooling over at our computer
dep't. and she suggested that I ask you this:

        Can anyone give info, which I might pass along to her about
what is available and is possible to run over a VAX that will
make TEI tags automatically?
        Cindy gave for an analogy the simple commands in Word Perfect
that do bold w/o making the user say complex programming instructions, or the
macros...

        What I am saying is that she is very interested to hear what
might be out there that will semi-automate this whole, elaborate
tagging process. It may be that there is no such animal, but it never
hurts to ask...

        Thanks in advance!!
                                Gloria

-----
Gloria McMillan                 Adjunct Faculty, Pima Community College
EMail: [hidden email]  Writing Department

Reply | Threaded
Open this post in threaded view
|

Re: Info on automated tagging: TEI

David Megginson
>>>>> "Gloria" == Gloria McMillan <[hidden email]> writes:

 > Hi!  I was just talking to Cindy Dooling over at our computer
 > dep't. and she suggested that I ask you this:

 >         Can anyone give info, which I might pass along to her about
 > what is available and is possible to run over a VAX that will make
 > TEI tags automatically?  Cindy gave for an analogy the simple
 > commands in Word Perfect that do bold w/o making the user say
 > complex programming instructions, or the macros...

Are you tagging an existing text, or creating a new one?

For creating a new text, the best free product available is PSGML, a
fairly complete SGML editor running under Gnu Emacs.  I have used
PSGML extensively with the TEI DTD and have come to love its ability
to insert the right tag in the right place.  On the WWW, you can find
information on the PSGML Home Page:

  http://www.lysator.liu.se/projects/about_psgml

or you can download the program and documentation directly by
anonymous FTP:

  ftp://ftp.lysator.liu.se/pub/sgml/psgml-1a3.tar.gz

Note that for either of these, you require _at least_ version 19.19 of
Gnu Emacs (the version numbers for XEmacs, nee Lucid, are different).


You did not specify whether the VAX is running Unix or VMS.  If it is
running VMS, you should fire your computing staff :-); if it is
running Unix, there are also commercial editors available which are
faster and offer more bells and whistles -- you can find a fairly
complete list by anon. FTP at

  ftp://ifi.uio.no/pub/SGML/SGML-Tools/SGML-Tools.txt

(The same directory also has Postscript versions for several page
formats).  One of the most popular products, and also one of the
earliest in the market, is Author/Editor from SoftQuad, which also has
a range of other SGML-related products (I do not know which unices it
runs on, however):

SoftQuad Inc.
  56 Aberfoyle Crescent, Suite 810
  Toronto, Ontario
  M8X 2W4
  Canada
  Tel: +1 (416) 239-4801
  Fax: +1 (416) 239-7105

If you are adding markup to an existing text, you need to take a
completely different approach.  A good hacker can use a
text-processing language like perl to do most of the work, but for a
dedicated SGML system, I have been hearing good things about OmniMark
from Exoterica, here in Ottawa (again, it might not run under your
Unix):

Exoterica Corp.
  1545 Carling Avenue, Suite 404
  Ottawa, Ontario
  K1Z 8P9
  Canada
  Tel: +1 (613) 722-1700
  Fax: +1 (613) 722-5706
  Email: [hidden email]

Do not by any means limit your search to these products, but give the
list at ifi.uio.no a good and careful reading.


David

---
David Megginson                Department of English, University of Ottawa,
[hidden email]       Ottawa, Ontario, CANADA  K1N 6N5
[hidden email]    Phone: (613) 564-6850 (Office)
[hidden email]             (613) 564-9175 (FAX)

Reply | Threaded
Open this post in threaded view
|

Re: Info on automated tagging: TEI

Ole Norling-Christensen
In reply to this post by Gloria McMillan
Gloria McMillan asks

> what is available and is possible to run over a VAX that will
> make TEI tags automatically?
>
DIPA is a fast and reliable parser that we developped in order
to tag dictonary entries. Several printed dictionaries have been
converted into SGML-files by this DIctionary PArser (e.g. for
making new editions in the editing SGML-tool GestorLEX, or for extracting
distinct information types for reuse). Of course, the input
need not to be a dictionary, but it must be "dictionary like" in that
it consists of not too big subdocuments (like dictionary entries)
that can be parsed one at a time.

As its input it takes:

- a "grammar": a set of rewriting rules that define the structure which
  is to be made explicit by SGML-tags; three types of "leaves" are accepted:
  character sets (e.g. letters of an alphabet); concrete strings (e.g.
  different kinds of interpunction); and references to the "lexicon"

- "lexicon" of classes of concrete strings, e.g. the dictionary's
  abbreviations for p.o.s., subject and register labels, etc.

- the text to be syntax-checked and/or tagged.

Output:

- a file of all accepted entries

- a file of the non-accepted entries, with a marking of the place in
  the text where the parser gave up its analysis (and the reason why)

- an sgml-tagged version of the accepted entries

You can run it in batch mode or interactively. In the latter case, any
parsing error will cause a menu that gives the possibility of editing
the grammar, the lexicon or the entry.

For practical use, most input files have to be preprocessed in order
to adjust tags/codes, and perhaps character set. In some cases, also
some postprocessing of the tagged file may be wanted.

For the time being, DIPA is used under DOS and UNIX; it is written
in Microsoft C, so it should be possible to recompile it for use
on e.g. a VAX.

For further information, please contact me at this address:

*****************************************
* Ole Norling-Christensen               *
* [hidden email]                  *
*                                       *
* The Danish Dictionary, KUA            *
* Njalsgade 80, DK-2300 Copenhagen S    *
* Tel +45 3532 8995 - Fax +45 3154 2595 *
*****************************************