seeking links to TEI corpora

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

seeking links to TEI corpora

Lavin, Matthew J

Apologies for any duplicates received due to cross-posting.

 

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

 

Bulk download of raw xml (not html transformed)

Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

 

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads! 

 

Matthew Lavin

Clinical Assistant Professor of English and Director of Digital Media Lab

University of Pittsburgh

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: seeking links to TEI corpora

Dalmau, Michelle Denise
Dear Matthew,

The IU Libraries provide XML downloads (at the item-level) for the following TEI P5 collections:

Victorian Women Writers Project: http://www.dlib.indiana.edu/collections/vwwp/

We have two additional projects in TEI P4 with XML download:
Indiana Authors and Their Books: http://dlib.indiana.edu/collections/inauthors
Indiana Magazine of History: https://scholarworks.iu.edu/journals/index.php/imh (XML download in the View Text link per article)

You could also grab most of these files via GitHub:
https://github.com/iulibdcs/tei_text  (caveat: the repo needs to be refreshed — on our to-do list)

This is probably not what you are after, but we provide EAD XML access to IU finding aids as well:

—Michelle
-----
Michelle Dalmau
Head, Digital Collections Services
-----
Indiana University Libraries
Herman B Wells Library
1320 East 10th Street, Rm W501
Bloomington, Indiana 47405
-----
Web:  http://michelledalmau.com
Twitter:  @mdalmau


On Dec 19, 2016, at 12:13 PM, Lavin, Matthew J <[hidden email]> wrote:

Apologies for any duplicates received due to cross-posting.
 
I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:
 
Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)
 
Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads! 
 
Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh
 


On Dec 19, 2016, at 12:13 PM, Lavin, Matthew J <[hidden email]> wrote:

Apologies for any duplicates received due to cross-posting.
 
I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:
 
Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)
 
Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads! 
 
Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh
 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: seeking links to TEI corpora

Matthew Davis-2
In reply to this post by Lavin, Matthew J
Dear Matthew,

I don’t know that it’s what you’re looking for (it is still early days, there’s still a lot to transcribe and input, and I’m one person doing all the work), but I think my archive of Lydgate works may meet your criteria.  There’s a link to download the xml for each transformed html page, and the raw xml files are stored until an XML folder by work.  

The link is www.minorworksoflydgate.net.  Much of it is still behind a password as I’m hoping to have  a peer review done on it, but the items in the Clopton chantry chapel (http://www.minorworksoflydgate.net/Quis_Dabit/Clopton/ww_qd_1.html and http://www.minorworksoflydgate.net/Testament/Clopton/sw_test_1.html) are readily accessible since the transcriptions will be published in January.  If it’s what you’re looking for, send me a message off-list and I’ll give you the password credentials for the other items.

There’s also a section on the site, “About the Archive,” that articulates some of my thinking about site design, the decisions I made while encoding, etc.

All the best,
—Matt


On Dec 19, 2016, at 9:13 AM, Lavin, Matthew J <[hidden email]> wrote:

Apologies for any duplicates received due to cross-posting.
 
I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:
 
Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)
 
Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads! 
 
Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh
 

MLH
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: seeking links to TEI corpora

MLH
In reply to this post by Lavin, Matthew J

Hi Matthew,


Greta Franzini's digital scholarly editions app links to 16 resources with downloadable TEI, but I'm afraid it doesn't specify regarding bulk download / predictable URLs. Still, it may be worth a look.

https://dig-ed-cat.acdh.oeaw.ac.at/browsing/editions/?name=&institution__name=&manager__name=&url=&scholarly=&digital=&edition=&writing_support=&begin_date=&end_date=&audience=&philological_statement=&textual_variance=&value_witnesses=&tei_transcription=&download=1&images=&zoom_images=&image_manipulation=&text_image=&source_translation=&glossary=&indices=&search=&advanced_search=&cc_license=&open_source=&infrastructure=&key_or_ocr=&print_friendly=&api=&amount=&Filter=Filter


Matthew


From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of TEI-L automatic digest system <[hidden email]>
Sent: 20 December 2016 05:00
To: [hidden email]
Subject: TEI-L Digest - 18 Dec 2016 to 19 Dec 2016 (#2016-244)
 
There are 4 messages totaling 541 lines in this issue.

Topics of the day:

  1. seeking links to TEI corpora (3)
  2. Don't upgrade your Oxygen plugin yet!

----------------------------------------------------------------------

Date:    Mon, 19 Dec 2016 17:13:29 +0000
From:    "Lavin, Matthew J" <[hidden email]>
Subject: seeking links to TEI corpora

Apologies for any duplicates received due to cross-posting.

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh

------------------------------

Date:    Mon, 19 Dec 2016 09:33:09 -0800
From:    Martin Holmes <[hidden email]>
Subject: Don't upgrade your Oxygen plugin yet!

Hi all,

We've found a problem with the latest release of the TEI Oxygen plugin,
so if you have it installed (instead of the regular TEI framework
bundled with Oxygen), please don't update it to the new release. We're
working on the problem.

Cheers,
Martin

------------------------------

Date:    Mon, 19 Dec 2016 18:05:19 +0000
From:    "Dalmau, Michelle Denise" <[hidden email]>
Subject: Re: seeking links to TEI corpora

Dear Matthew,

The IU Libraries provide XML downloads (at the item-level) for the following TEI P5 collections:

Wright American Fiction: http://dlib.indiana.edu/collections/wright/
dlib.indiana.edu
Lyle H. Wright, a librarian at the Huntington Library in San Marino, CA, created a bibliography of American fiction from the years 1851–1875, published as American ...

Victorian Women Writers Project: http://www.dlib.indiana.edu/collections/vwwp/
www.dlib.indiana.edu
The Victorian Women Writers Project (VWWP) began in 1995 at Indiana University and is primarily concerned with the exposure of lesser-known British women writers of ...

Brevier Legislative Reports: http://www.dlib.indiana.edu/collections/law/brevier/

We have two additional projects in TEI P4 with XML download:
Indiana Authors and Their Books: http://dlib.indiana.edu/collections/inauthors
dlib.indiana.edu
Indiana Authors and Their Books is an LSTA–funded project based on the digitization and encoding of the 3–volume reference work, Indiana Authors and Their Books ...

Indiana Magazine of History: https://scholarworks.iu.edu/journals/index.php/imh (XML download in the View Text link per article)
scholarworks.iu.edu
Published continuously since 1905, the Indiana Magazine of History is one of the nation's oldest historical journals. Since 1913, the IMH has been edited and ...


You could also grab most of these files via GitHub:
https://github.com/iulibdcs/tei_text  (caveat: the repo needs to be refreshed — on our to-do list)
github.com
tei_text - Free-for-all repository of TEI and plain text files for you (to do cool stuff) provided by the Digital Collections Services group at the Indiana University ...


This is probably not what you are after, but we provide EAD XML access to IU finding aids as well:
http://dlib.indiana.edu/collections/findingaids/
dlib.indiana.edu
Welcome to Archives Online at Indiana University. This site is a portal for accessing descriptions of Special Collections and Archives - ones chiefly containing ...


—Michelle
-----
Michelle Dalmau
Head, Digital Collections Services
-----
Indiana University Libraries
Herman B Wells Library
1320 East 10th Street, Rm W501
Bloomington, Indiana 47405
-----
Web:  http://michelledalmau.com
Twitter:  @mdalmau


On Dec 19, 2016, at 12:13 PM, Lavin, Matthew J <[hidden email]<mailto:[hidden email]>> wrote:

Apologies for any duplicates received due to cross-posting.

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh



On Dec 19, 2016, at 12:13 PM, Lavin, Matthew J <[hidden email]<mailto:[hidden email]>> wrote:

Apologies for any duplicates received due to cross-posting.

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh


------------------------------

Date:    Mon, 19 Dec 2016 10:08:46 -0800
From:    Matthew Davis <[hidden email]>
Subject: Re: seeking links to TEI corpora

Dear Matthew,

I don’t know that it’s what you’re looking for (it is still early days, there’s still a lot to transcribe and input, and I’m one person doing all the work), but I think my archive of Lydgate works may meet your criteria.  There’s a link to download the xml for each transformed html page, and the raw xml files are stored until an XML folder by work. 

The link is www.minorworksoflydgate.net <http://www.minorworksoflydgate.net/>.  Much of it is still behind a password as I’m hoping to have  a peer review done on it, but the items in the Clopton chantry chapel (http://www.minorworksoflydgate.net/Quis_Dabit/Clopton/ww_qd_1.html <http://www.minorworksoflydgate.net/Quis_Dabit/Clopton/ww_qd_1.html> and http://www.minorworksoflydgate.net/Testament/Clopton/sw_test_1.html <http://www.minorworksoflydgate.net/Testament/Clopton/sw_test_1.html>) are readily accessible since the transcriptions will be published in January.  If it’s what you’re looking for, send me a message off-list and I’ll give you the password credentials for the other items.
www.minorworksoflydgate.net
Welcome to the virtual archive of the minor works of the fifteenth-century poet, John Lydgate. The goals of this archive are twofold: first, it is an ...
www.minorworksoflydgate.net
Welcome to the virtual archive of the minor works of the fifteenth-century poet, John Lydgate. The goals of this archive are twofold: first, it is an ...


There’s also a section on the site, “About the Archive,” that articulates some of my thinking about site design, the decisions I made while encoding, etc.

All the best,
—Matt


> On Dec 19, 2016, at 9:13 AM, Lavin, Matthew J <[hidden email]> wrote:
>
> Apologies for any duplicates received due to cross-posting.

> I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

> Bulk download of raw xml (not html transformed)
> Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

> Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

> Matthew Lavin
> Clinical Assistant Professor of English and Director of Digital Media Lab
> University of Pittsburgh


------------------------------

End of TEI-L Digest - 18 Dec 2016 to 19 Dec 2016 (#2016-244)
************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: seeking links to TEI corpora

Christian Thomas

Hi Matthew, a source currently missing in Greta's catalogue (we will add it there asap) is the Deutsches Textarchiv / German Text Archive, http://www.deutschestextarchiv.de/. It should fulfill your criteria:


* Bulk download of raw xml (not html transformed) here: http://www.deutschestextarchiv.de/download
(2435 documents)

** API: cf. http://www.deutschestextarchiv.de/api
* predictable url structure: http://www.deutschestextarchiv.de/book/download_xml/[DTA-dirname].


Best wishes
Christian Thomas




Am 21.12.2016 um 11:30 schrieb MLH:

Hi Matthew,


Greta Franzini's digital scholarly editions app links to 16 resources with downloadable TEI, but I'm afraid it doesn't specify regarding bulk download / predictable URLs. Still, it may be worth a look.

https://dig-ed-cat.acdh.oeaw.ac.at/browsing/editions/?name=&institution__name=&manager__name=&url=&scholarly=&digital=&edition=&writing_support=&begin_date=&end_date=&audience=&philological_statement=&textual_variance=&value_witnesses=&tei_transcription=&download=1&images=&zoom_images=&image_manipulation=&text_image=&source_translation=&glossary=&indices=&search=&advanced_search=&cc_license=&open_source=&infrastructure=&key_or_ocr=&print_friendly=&api=&amount=&Filter=Filter


Matthew


From: TEI (Text Encoding Initiative) public discussion list [hidden email] on behalf of TEI-L automatic digest system [hidden email]
Sent: 20 December 2016 05:00
To: [hidden email]
Subject: TEI-L Digest - 18 Dec 2016 to 19 Dec 2016 (#2016-244)
 
There are 4 messages totaling 541 lines in this issue.

Topics of the day:

  1. seeking links to TEI corpora (3)
  2. Don't upgrade your Oxygen plugin yet!

----------------------------------------------------------------------

Date:    Mon, 19 Dec 2016 17:13:29 +0000
From:    "Lavin, Matthew J" [hidden email]
Subject: seeking links to TEI corpora

Apologies for any duplicates received due to cross-posting.

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh

------------------------------

Date:    Mon, 19 Dec 2016 09:33:09 -0800
From:    Martin Holmes [hidden email]
Subject: Don't upgrade your Oxygen plugin yet!

Hi all,

We've found a problem with the latest release of the TEI Oxygen plugin,
so if you have it installed (instead of the regular TEI framework
bundled with Oxygen), please don't update it to the new release. We're
working on the problem.

Cheers,
Martin

------------------------------

Date:    Mon, 19 Dec 2016 18:05:19 +0000
From:    "Dalmau, Michelle Denise" [hidden email]
Subject: Re: seeking links to TEI corpora

Dear Matthew,

The IU Libraries provide XML downloads (at the item-level) for the following TEI P5 collections:

Wright American Fiction: http://dlib.indiana.edu/collections/wright/
dlib.indiana.edu
Lyle H. Wright, a librarian at the Huntington Library in San Marino, CA, created a bibliography of American fiction from the years 1851–1875, published as American ...

Victorian Women Writers Project: http://www.dlib.indiana.edu/collections/vwwp/
The Victorian Women Writers Project (VWWP) began in 1995 at Indiana University and is primarily concerned with the exposure of lesser-known British women writers of ...

Brevier Legislative Reports: http://www.dlib.indiana.edu/collections/law/brevier/

We have two additional projects in TEI P4 with XML download:
Indiana Authors and Their Books: http://dlib.indiana.edu/collections/inauthors
dlib.indiana.edu
Indiana Authors and Their Books is an LSTA–funded project based on the digitization and encoding of the 3–volume reference work, Indiana Authors and Their Books ...

Indiana Magazine of History: https://scholarworks.iu.edu/journals/index.php/imh (XML download in the View Text link per article)
scholarworks.iu.edu
Published continuously since 1905, the Indiana Magazine of History is one of the nation's oldest historical journals. Since 1913, the IMH has been edited and ...


You could also grab most of these files via GitHub:
https://github.com/iulibdcs/tei_text  (caveat: the repo needs to be refreshed — on our to-do list)
github.com
tei_text - Free-for-all repository of TEI and plain text files for you (to do cool stuff) provided by the Digital Collections Services group at the Indiana University ...


This is probably not what you are after, but we provide EAD XML access to IU finding aids as well:
http://dlib.indiana.edu/collections/findingaids/
dlib.indiana.edu
Welcome to Archives Online at Indiana University. This site is a portal for accessing descriptions of Special Collections and Archives - ones chiefly containing ...


—Michelle
-----
Michelle Dalmau
Head, Digital Collections Services
-----
Indiana University Libraries
Herman B Wells Library
1320 East 10th Street, Rm W501
Bloomington, Indiana 47405
-----
Web:  http://michelledalmau.com
Twitter:  @mdalmau


On Dec 19, 2016, at 12:13 PM, Lavin, Matthew J <[hidden email][hidden email]> wrote:

Apologies for any duplicates received due to cross-posting.

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh



On Dec 19, 2016, at 12:13 PM, Lavin, Matthew J <[hidden email][hidden email]> wrote:

Apologies for any duplicates received due to cross-posting.

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh


------------------------------

Date:    Mon, 19 Dec 2016 10:08:46 -0800
From:    Matthew Davis [hidden email]
Subject: Re: seeking links to TEI corpora

Dear Matthew,

I don’t know that it’s what you’re looking for (it is still early days, there’s still a lot to transcribe and input, and I’m one person doing all the work), but I think my archive of Lydgate works may meet your criteria.  There’s a link to download the xml for each transformed html page, and the raw xml files are stored until an XML folder by work. 

The link is www.minorworksoflydgate.net <http://www.minorworksoflydgate.net/>.  Much of it is still behind a password as I’m hoping to have  a peer review done on it, but the items in the Clopton chantry chapel (http://www.minorworksoflydgate.net/Quis_Dabit/Clopton/ww_qd_1.html <http://www.minorworksoflydgate.net/Quis_Dabit/Clopton/ww_qd_1.html> and http://www.minorworksoflydgate.net/Testament/Clopton/sw_test_1.html <http://www.minorworksoflydgate.net/Testament/Clopton/sw_test_1.html>) are readily accessible since the transcriptions will be published in January.  If it’s what you’re looking for, send me a message off-list and I’ll give you the password credentials for the other items.
Welcome to the virtual archive of the minor works of the fifteenth-century poet, John Lydgate. The goals of this archive are twofold: first, it is an ...
Welcome to the virtual archive of the minor works of the fifteenth-century poet, John Lydgate. The goals of this archive are twofold: first, it is an ...


There’s also a section on the site, “About the Archive,” that articulates some of my thinking about site design, the decisions I made while encoding, etc.

All the best,
—Matt


> On Dec 19, 2016, at 9:13 AM, Lavin, Matthew J [hidden email] wrote:
>
> Apologies for any duplicates received due to cross-posting.

> I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

> Bulk download of raw xml (not html transformed)
> Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

> Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

> Matthew Lavin
> Clinical Assistant Professor of English and Director of Digital Media Lab
> University of Pittsburgh


------------------------------

End of TEI-L Digest - 18 Dec 2016 to 19 Dec 2016 (#2016-244)
************************************************************


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: seeking links to TEI corpora

Mathias Göbel
In reply to this post by Lavin, Matthew J

Dear Matthew,

in german there is the German Text Archive that provides complete dumps of the text, see: http://www.deutsches-textarchiv.de/download

Also you can download a complete zip file of the "Digital Library" hosted at textgridrep.org - a huge collection of german language literature: https://textgrid.de/digitale-bibliothek or you can use the OAI-PMH interface of the hosting repository.

There is many more out there - for example Paul Fievres Repo on GitHub containing frech theatre plays: https://github.com/dramacode/theatre-classique

Best,
Mathias

On 12/19/2016 06:13 PM, Lavin, Matthew J wrote:

Apologies for any duplicates received due to cross-posting.

 

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

 

Bulk download of raw xml (not html transformed)

Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

 

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads! 

 

Matthew Lavin

Clinical Assistant Professor of English and Director of Digital Media Lab

University of Pittsburgh

 


--
Mathias Göbel
Abt. Forschung & Entwicklung

Georg-August-Universität Göttingen
Niedersächsische Staats- und Universitätsbibliothek Göttingen
D-37070 Göttingen

Papendiek 14 (hist. Gebäude, Raum 2.408)
+49 551 39-20184 (Tel.)
+49 551 39-33856 (Fax.)

[hidden email]
http://www.sub.uni-goettingen.de
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: seeking links to TEI corpora

Stuart A. Yeates
In reply to this post by MLH
A couple of years ago I gathered some URLs into a project for testing.

https://github.com/stuartyeates/sampler/tree/master/TEI

Cheers
Stuart

On Wednesday, December 21, 2016, MLH <[hidden email]> wrote:

Hi Matthew,


Greta Franzini's digital scholarly editions app links to 16 resources with downloadable TEI, but I'm afraid it doesn't specify regarding bulk download / predictable URLs. Still, it may be worth a look.

https://dig-ed-cat.acdh.oeaw.ac.at/browsing/editions/?name=&institution__name=&manager__name=&url=&scholarly=&digital=&edition=&writing_support=&begin_date=&end_date=&audience=&philological_statement=&textual_variance=&value_witnesses=&tei_transcription=&download=1&images=&zoom_images=&image_manipulation=&text_image=&source_translation=&glossary=&indices=&search=&advanced_search=&cc_license=&open_source=&infrastructure=&key_or_ocr=&print_friendly=&api=&amount=&Filter=Filter


Matthew


From: TEI (Text Encoding Initiative) public discussion list <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;TEI-L@LISTSERV.BROWN.EDU&#39;);" target="_blank">TEI-L@...> on behalf of TEI-L automatic digest system <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;LISTSERV@LISTSERV.BROWN.EDU&#39;);" target="_blank">LISTSERV@...>
Sent: 20 December 2016 05:00
To: <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;TEI-L@LISTSERV.BROWN.EDU&#39;);" target="_blank">TEI-L@...
Subject: TEI-L Digest - 18 Dec 2016 to 19 Dec 2016 (#2016-244)
 
There are 4 messages totaling 541 lines in this issue.

Topics of the day:

  1. seeking links to TEI corpora (3)
  2. Don't upgrade your Oxygen plugin yet!

----------------------------------------------------------------------

Date:    Mon, 19 Dec 2016 17:13:29 +0000
From:    "Lavin, Matthew J" <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;lavin@PITT.EDU&#39;);" target="_blank">lavin@...>
Subject: seeking links to TEI corpora

Apologies for any duplicates received due to cross-posting.

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh

------------------------------

Date:    Mon, 19 Dec 2016 09:33:09 -0800
From:    Martin Holmes <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;mholmes@UVIC.CA&#39;);" target="_blank">mholmes@...>
Subject: Don't upgrade your Oxygen plugin yet!

Hi all,

We've found a problem with the latest release of the TEI Oxygen plugin,
so if you have it installed (instead of the regular TEI framework
bundled with Oxygen), please don't update it to the new release. We're
working on the problem.

Cheers,
Martin

------------------------------

Date:    Mon, 19 Dec 2016 18:05:19 +0000
From:    "Dalmau, Michelle Denise" <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;mdalmau@INDIANA.EDU&#39;);" target="_blank">mdalmau@...>
Subject: Re: seeking links to TEI corpora

Dear Matthew,

The IU Libraries provide XML downloads (at the item-level) for the following TEI P5 collections:

Wright American Fiction: http://dlib.indiana.edu/collections/wright/
Lyle H. Wright, a librarian at the Huntington Library in San Marino, CA, created a bibliography of American fiction from the years 1851–1875, published as American ...

Victorian Women Writers Project: http://www.dlib.indiana.edu/collections/vwwp/
The Victorian Women Writers Project (VWWP) began in 1995 at Indiana University and is primarily concerned with the exposure of lesser-known British women writers of ...

Brevier Legislative Reports: http://www.dlib.indiana.edu/collections/law/brevier/

We have two additional projects in TEI P4 with XML download:
Indiana Authors and Their Books: http://dlib.indiana.edu/collections/inauthors
Indiana Authors and Their Books is an LSTA–funded project based on the digitization and encoding of the 3–volume reference work, Indiana Authors and Their Books ...

Indiana Magazine of History: https://scholarworks.iu.edu/journals/index.php/imh (XML download in the View Text link per article)
Published continuously since 1905, the Indiana Magazine of History is one of the nation's oldest historical journals. Since 1913, the IMH has been edited and ...


You could also grab most of these files via GitHub:
https://github.com/iulibdcs/tei_text  (caveat: the repo needs to be refreshed — on our to-do list)
tei_text - Free-for-all repository of TEI and plain text files for you (to do cool stuff) provided by the Digital Collections Services group at the Indiana University ...


This is probably not what you are after, but we provide EAD XML access to IU finding aids as well:
http://dlib.indiana.edu/collections/findingaids/
Welcome to Archives Online at Indiana University. This site is a portal for accessing descriptions of Special Collections and Archives - ones chiefly containing ...


—Michelle
-----
Michelle Dalmau
Head, Digital Collections Services
-----
Indiana University Libraries
Herman B Wells Library
1320 East 10th Street, Rm W501
Bloomington, Indiana 47405
-----
Web:  http://michelledalmau.com
Twitter:  @mdalmau


On Dec 19, 2016, at 12:13 PM, Lavin, Matthew J <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;lavin@PITT.EDU&#39;);" target="_blank">lavin@...<mailto:<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;lavin@pitt.edu&#39;);" target="_blank">lavin@pitt.edu>> wrote:

Apologies for any duplicates received due to cross-posting.

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh



On Dec 19, 2016, at 12:13 PM, Lavin, Matthew J <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;lavin@PITT.EDU&#39;);" target="_blank">lavin@...<mailto:<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;lavin@pitt.edu&#39;);" target="_blank">lavin@pitt.edu>> wrote:

Apologies for any duplicates received due to cross-posting.

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh


------------------------------

Date:    Mon, 19 Dec 2016 10:08:46 -0800
From:    Matthew Davis <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;matthew@MATTHEWEDAVIS.NET&#39;);" target="_blank">matthew@...>
Subject: Re: seeking links to TEI corpora

Dear Matthew,

I don’t know that it’s what you’re looking for (it is still early days, there’s still a lot to transcribe and input, and I’m one person doing all the work), but I think my archive of Lydgate works may meet your criteria.  There’s a link to download the xml for each transformed html page, and the raw xml files are stored until an XML folder by work. 

The link is www.minorworksoflydgate.net <http://www.minorworksoflydgate.net/>.  Much of it is still behind a password as I’m hoping to have  a peer review done on it, but the items in the Clopton chantry chapel (http://www.minorworksoflydgate.net/Quis_Dabit/Clopton/ww_qd_1.html <http://www.minorworksoflydgate.net/Quis_Dabit/Clopton/ww_qd_1.html> and http://www.minorworksoflydgate.net/Testament/Clopton/sw_test_1.html <http://www.minorworksoflydgate.net/Testament/Clopton/sw_test_1.html>) are readily accessible since the transcriptions will be published in January.  If it’s what you’re looking for, send me a message off-list and I’ll give you the password credentials for the other items.
Welcome to the virtual archive of the minor works of the fifteenth-century poet, John Lydgate. The goals of this archive are twofold: first, it is an ...
Welcome to the virtual archive of the minor works of the fifteenth-century poet, John Lydgate. The goals of this archive are twofold: first, it is an ...


There’s also a section on the site, “About the Archive,” that articulates some of my thinking about site design, the decisions I made while encoding, etc.

All the best,
—Matt


> On Dec 19, 2016, at 9:13 AM, Lavin, Matthew J <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;lavin@PITT.EDU&#39;);" target="_blank">lavin@...> wrote:
>
> Apologies for any duplicates received due to cross-posting.

> I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

> Bulk download of raw xml (not html transformed)
> Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

> Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

> Matthew Lavin
> Clinical Assistant Professor of English and Director of Digital Media Lab
> University of Pittsburgh


------------------------------

End of TEI-L Digest - 18 Dec 2016 to 19 Dec 2016 (#2016-244)
************************************************************


--
--
...let us be heard from red core to black sky
Loading...