Michael's question (which I paraphrase from memory as "why have a
special tag for PLACE-NAME-WHICH-IS-TAGGED-AS-OPPOSED-TO-PLACENAME-WHICH-ISNT?")
was rather nicely answered before it was asked, by Bob Amsler's comment
that it was precisely words of the kind tagged in my texts which were
of interest for some applications. 'Bath' is not just a placename: it's
a placename which a dumb (or not so dumb) piece of software might
mistake for a common noun. The intention of the encoder of my texts was
to distinguish words that should be distinguished, not to categorise all
the words of the text. I am coming round to the view that I need a
tag like <homograph> to do an honest job on this text.
Bob Kraft's suggestion (that the tags should be separated out from the
words) raised a few hairs on my spine: that would mean somewhere
deciding whether the encoded bit of content related to the word before
it or the word after it, how many words on either side ... No, the word
and the codes have to be treated as a unit of some kind.
My suggested use of entity refs seems to have died the death so I will not
revive it here.
Everyone quite rightly shuddered at the thought of throwing the
information away. Just for the record, though, that's what the encoder
of the text proposed....
Thanks to everyone anyway -- there's more what that came from