|
I'm working on restoring a text file that was keypunched in the 1960s
to complete upper/lower case ASCII. (It currently is ALL UPPER CASE).
I would like to `do it right' such that more than just the upper/lower
case, italics, accent marks, paragraph, headings are correctly
identified. I.e. capitalization seems to me to be a reflection of
a deeper semantic reason things are capitalized, such as that they
are a particular type of proper noun, name of a person, country, company,
etc. Italics likewise reflects roles such as book, play, movie, etc. titles;
foreign words, quoted material, emphasis, etc.
Does anyone have any suggestions as to how to do such tagging? I.e.,
if one encounters something such as,
`PRESIDENT KENNEDY TOLD PRIME MINISTER MACMILLIAN ...'
and wants to restore the reasons it should be capitalized as ,
`President Kennedy told Prime Minister Macmillian...'
(Note: it is Macmillian not MacMillian) what should one do?
Thus, I can markup the text as,
<Sentence>
<NationLeader country=USA>
<Title> President </Title>
<LName id=John_F._Kennedy> Kennedy </LName> </NationLeader>
told
<NationLeader country=GB>
<Title> Prime Minister </Title>
<LName id=Harold_Macmillan> Macmillan </LName> </NationLeader>
...
|