Hierarchy of XML files for <text> transcription and <msDesc>

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Hierarchy of XML files for <text> transcription and <msDesc>

Sebastiaan Verweij-3
Dear TEI list

I’m coming back to TEI after a long spell of other things, and am wondering how best to approach a new project. I have IT support at my university, though we are building this sort of DH resource here for the first time. We are using an eXist database. 

We will be digitising British Library manuscripts; creating bibliographical descriptions for these; and creating transcriptions of selected texts. The texts are so-called ‘manuscript pamphlets’, political in nature, often copied in multiple manuscripts, and appearing in different kinds of manuscript (e.g., letter books, journals, bound separates, etc). Most users will come to our website wanting to read the texts alongside the MS images. Secondarily, however, they will want to find info about all the witnesses containing the same text, and associated info (e.g., who made the copy, when, and where, so that we can also trace networks of dissemination). In addition, we’d like to list manuscript content for each witness separately (though not exhaustively). 

I am wondering how best to organise my corpus. I’m very used to writing single witness XML, with a <header> containing a <msDesc>, and <text> element with full transcription, and/or <facsimile>. We now have research questions around two major ‘units’: first unit is the texts themselves, their content (in full diplomatic transcription, lightly marked up for names, dates, places), the associated text types (e.g., a prophesy, a speech, a proclamation), their subject (e.g., royal succession, taxation, death of Prince Henry). The second ‘unit’ of interest if the manuscript itself: even if we only select a single manuscript from which to transcribe, we still need to capture information about all the others containing copies, in the usual style of <msDesc>, including (selected) contents.

We now have three possible ways forward: 

We encode the info about all manuscript witnesses in a single <header> in the same doc that contains the <text> transcription, using multiple instances of <msDesc> under <listBibl>. The huge downside to this is that some texts may have, say, 20 manuscripts witnesses (with a lot of associated bibliographical detail); moreover, there will also be considerable duplication of <msDesc> where different texts are shared by the same manuscript (so we would have many identical <msDesc> sections across different XML docs that transcribe different texts). 

We separate out XML docs for manuscripts and texts. For the former, we’d create a single XML for each MS perhaps nested like this:

The upside here would be less duplication, but the text XMLs themselves would not contain the MS description data, but only references to these other docs. 

We are also considering a half-way solution, where the transcribed texts have a single witness (from which the text is taken) described in <msDesc>, and further references to other witness that are themselves described in separate XML docs. (I don’t much like this option). 

Do any of these approaches violate the spirit of TEI? Are there more obvious ways to do this? If you have experience creating a corpus like this, I’d be very grateful for advice on how best to organise the TEI infrastructure! Huge thanks. 


Dr Sebastiaan Verweij
Lecturer in Late-Medieval and Early Modern English Literature
University of Bristol 
(+44) (0) <a href="tel://117 92 88090">117 92 88090