Quantcast

Conversion of tei.2 files to TEI P5 (LOC American Memory DTD)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Conversion of tei.2 files to TEI P5 (LOC American Memory DTD)

Joe Wicentowski
Hi everyone,

Has anyone converted tei.2 files (ideally, those from the Library of Congress's American Memory project, DTD developed circa 1997[1]) to TEI P5?  I'm looking at a batch of ~1,700 files with transcripts of oral history interviews [2], and while I could rig up a script to perform a basic conversion, I thought I'd ask if there was any previous work in this area.

Thanks,
Joe

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Conversion of tei.2 files to TEI P5 (LOC American Memory DTD)

James Cummings-4
Hi Joe, 

Walking back to my hotel at the TEI Council meeting in Prague.  There is a p4top5 stylesheet at 
https://github.com/TEIC/Stylesheets/blob/dev/profiles/default/p4/from.xsl

It isn't perfect especially where ppl have customised and the p5 it produces might be slightly out of date, but that is where is start if someone hasn't converted it already. 

James 


--
Dr James Cummings, Academic IT Services, University of Oxford


On 7 Feb 2017 4:10 p.m., Joe Wicentowski <[hidden email]> wrote:
Hi everyone,

Has anyone converted tei.2 files (ideally, those from the Library of Congress's American Memory project, DTD developed circa 1997[1]) to TEI P5?  I'm looking at a batch of ~1,700 files with transcripts of oral history interviews [2], and while I could rig up a script to perform a basic conversion, I thought I'd ask if there was any previous work in this area.

Thanks,
Joe

[1] ftp://ftp.loc.gov/pub/american.memory/sgml/ammem2/INDEX 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Conversion of tei.2 files to TEI P5 (LOC American Memory DTD)

Stuart A. Yeates
There is an entire category of xsl scripts in the wiki for handling different aspects of this transition. http://wiki.tei-c.org/index.php/Category:P4toP5 

The one I personally used is at http://wiki.tei-c.org/index.php/P4toP5NZETC and is a mix of strict migration issues, fixing local issues and fixing things in the text that weren't checked for by the previous schema.

cheers
stuart 

--
...let us be heard from red core to black sky

On 8 February 2017 at 04:20, James Cummings <[hidden email]> wrote:
Hi Joe, 

Walking back to my hotel at the TEI Council meeting in Prague.  There is a p4top5 stylesheet at 

It isn't perfect especially where ppl have customised and the p5 it produces might be slightly out of date, but that is where is start if someone hasn't converted it already. 

James 


--
Dr James Cummings, Academic IT Services, University of Oxford


On 7 Feb 2017 4:10 p.m., Joe Wicentowski <[hidden email]> wrote:
Hi everyone,

Has anyone converted tei.2 files (ideally, those from the Library of Congress's American Memory project, DTD developed circa 1997[1]) to TEI P5?  I'm looking at a batch of ~1,700 files with transcripts of oral history interviews [2], and while I could rig up a script to perform a basic conversion, I thought I'd ask if there was any previous work in this area.

Thanks,
Joe



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Conversion of tei.2 files to TEI P5 (LOC American Memory DTD)

Joe Wicentowski
Thanks, James, Stuart, and those who replied off list.  These look like great resources.

Having only started working with TEI just after the release of P5 in late 2007, I blindly assumed that these LOC documents, whose root element was <TEI2>, were necessarily circa P2.  But I gather from the comments that these appear P4-esque.  

I now see from the Vault that P4 used <TEI.2>, P3 used <tei.2>, P2 used <TEI.2>, and P1 used <teidoc> (!).  Some guideposts for the TEI forensicists...

The DTD comments note that the DTD was updated to P3, but P3 didn't use <TEI.2>.  The American Memory DTD notes: "LC staff renamed TEI element <TEI.2> as <TEI2>."  And, to boot, the samples I've looked at use <tei2>, not <TEI2>; the entity files referenced in each TEI file are all missing; and the DOCTYPE declarations are encased in comments.

So it seems these LOC files represent a rather significant customization - some kind of hybrid.  

On the plus side, the DTD is well documented and the TEI files themselves appear to be all well-formed XML - so assuming they're internally consistent, it shouldn't be too hard to pull the data out into a basic P5-compliant form.

Thanks again!
Joe

On Tue, Feb 7, 2017 at 2:54 PM, Stuart A. Yeates <[hidden email]> wrote:
There is an entire category of xsl scripts in the wiki for handling different aspects of this transition. http://wiki.tei-c.org/index.php/Category:P4toP5 

The one I personally used is at http://wiki.tei-c.org/index.php/P4toP5NZETC and is a mix of strict migration issues, fixing local issues and fixing things in the text that weren't checked for by the previous schema.

cheers
stuart 

--
...let us be heard from red core to black sky

On 8 February 2017 at 04:20, James Cummings <[hidden email]> wrote:
Hi Joe, 

Walking back to my hotel at the TEI Council meeting in Prague.  There is a p4top5 stylesheet at 

It isn't perfect especially where ppl have customised and the p5 it produces might be slightly out of date, but that is where is start if someone hasn't converted it already. 

James 


--
Dr James Cummings, Academic IT Services, University of Oxford


On 7 Feb 2017 4:10 p.m., Joe Wicentowski <[hidden email]> wrote:
Hi everyone,

Has anyone converted tei.2 files (ideally, those from the Library of Congress's American Memory project, DTD developed circa 1997[1]) to TEI P5?  I'm looking at a batch of ~1,700 files with transcripts of oral history interviews [2], and while I could rig up a script to perform a basic conversion, I thought I'd ask if there was any previous work in this area.

Thanks,
Joe




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Conversion of tei.2 files to TEI P5 (LOC American Memory DTD)

Joe Wicentowski
Hi all,

Just a brief update: I've completed a basic conversion of the data in question and posted the conversion script, along with the resulting data and generated eXist app (generated with TEI Publisher), at https://github.com/joewiz/adst with some screenshots and directions.  


If you haven't seen TEI Publisher, its project homepage is http://teipublisher.com.  

Thanks again for everyone's help,
Joe


On Tue, Feb 7, 2017 at 4:12 PM, Joe Wicentowski <[hidden email]> wrote:
Thanks, James, Stuart, and those who replied off list.  These look like great resources.

Having only started working with TEI just after the release of P5 in late 2007, I blindly assumed that these LOC documents, whose root element was <TEI2>, were necessarily circa P2.  But I gather from the comments that these appear P4-esque.  

I now see from the Vault that P4 used <TEI.2>, P3 used <tei.2>, P2 used <TEI.2>, and P1 used <teidoc> (!).  Some guideposts for the TEI forensicists...

The DTD comments note that the DTD was updated to P3, but P3 didn't use <TEI.2>.  The American Memory DTD notes: "LC staff renamed TEI element <TEI.2> as <TEI2>."  And, to boot, the samples I've looked at use <tei2>, not <TEI2>; the entity files referenced in each TEI file are all missing; and the DOCTYPE declarations are encased in comments.

So it seems these LOC files represent a rather significant customization - some kind of hybrid.  

On the plus side, the DTD is well documented and the TEI files themselves appear to be all well-formed XML - so assuming they're internally consistent, it shouldn't be too hard to pull the data out into a basic P5-compliant form.

Thanks again!
Joe

On Tue, Feb 7, 2017 at 2:54 PM, Stuart A. Yeates <[hidden email]> wrote:
There is an entire category of xsl scripts in the wiki for handling different aspects of this transition. http://wiki.tei-c.org/index.php/Category:P4toP5 

The one I personally used is at http://wiki.tei-c.org/index.php/P4toP5NZETC and is a mix of strict migration issues, fixing local issues and fixing things in the text that weren't checked for by the previous schema.

cheers
stuart 

--
...let us be heard from red core to black sky

On 8 February 2017 at 04:20, James Cummings <[hidden email]> wrote:
Hi Joe, 

Walking back to my hotel at the TEI Council meeting in Prague.  There is a p4top5 stylesheet at 

It isn't perfect especially where ppl have customised and the p5 it produces might be slightly out of date, but that is where is start if someone hasn't converted it already. 

James 


--
Dr James Cummings, Academic IT Services, University of Oxford


On 7 Feb 2017 4:10 p.m., Joe Wicentowski <[hidden email]> wrote:
Hi everyone,

Has anyone converted tei.2 files (ideally, those from the Library of Congress's American Memory project, DTD developed circa 1997[1]) to TEI P5?  I'm looking at a batch of ~1,700 files with transcripts of oral history interviews [2], and while I could rig up a script to perform a basic conversion, I thought I'd ask if there was any previous work in this area.

Thanks,
Joe





Loading...