Following on from the previous discussion (and turning things on their head), I'm thinking of developing a service which harvests listPerson data and publishes it as a Linked Data resource. I've found the TEI examples page:
but this isn't massively helpful in giving me programmatic access to a set of freely licensed TEI documents to scan. Is there a 'VoID sitemap' of the TEI world, and if not, shouldn't there be?
Fair question: it's something from the Linked Data world (which I would like the TEI community to be a part of, in some sense):
They're talking there about referencing RDF datasets, but the principle could equally apply to TEI documents. Think XML Catalogs with additional useful metadata about each document.
The core idea with Linked Data is that entities of interest have
fixed, dereferenceable URIs. Dereferenceable means that you can
request the URI (possibly in a special way) and get
machine-processible data back. I would like to see TEI documents
having this level of accessibility. It needn't be hard: just post
the XML document on the web and publish its URL.
When I went looking for examples of <listPerson> last night, I found the TEI samples page . I have no idea which of the resources listed there contains <listPerson> examples, so I had to go down the page and:
This is hardly 'programmatic access'. A key idea with Linked
Data is that you (or rather a software agent) can 'follow its
nose', looking up Linked Data URIs, parsing the resources they
point to, finding mentions of further resources within the first
result and moving on to those. It's treating the Web as a
distributed database, and I would like TEI documents to be BLOB
fields within that distributed database. One obvious use case
would be Open Annotation .
On 21/11/2017 23:17, Lou Burnard wrote:
On 21/11/17 21:30, Richard Light wrote:
On 22/11/17 09:18, Richard Light wrote:
The core idea with Linked Data is that entities of interest have fixed, dereferenceable URIs. Dereferenceable means that you can request the URI (possibly in a special way) and get machine-processible data back. I would like to see TEI documents having this level of accessibility. It needn't be hard: just post the XML document on the web and publish its URL.
I vaguely remember that Stuart Yeates had a scripted procedure for automatically detecting TEI files available on the internet a few years back : (his project at https://github.com/stuartyeates/sampler is mentioned on the wiki page you quote) -- he might have some helpful comment here. The major problem I see, in practice, is the strange reluctance lots of TEI projects still have to expose their TEI source directly.
When I went looking for examples of <listPerson> last night, I found the TEI samples page . I have no idea which of the resources listed there contains <listPerson> examples,
well., of course if you download the source you could always do an xPath to find this pretty quickly. You don't even need to download the whole resource, if you can believe what its <tagUsage> element says.
In reply to this post by richard light
I would very much welcome something along these line.
On 24/11/2017 11:42, Paterson, Duncan wrote:
One would think so. Though a quick glance at this resource suggests that its name is sadly indicative of the content - lots of 'samples' in there. Also the advanced search falls over when asked to find things.
Dear Richard and list,
There has been some incentive to work on interchangeability of data from digital edition projects lately, and without getting into details I can think of an Andrew Mellon grant opportunity whose deadline I missed last July that might have incentivized the construction of shared practices within a community of datasets—at least that was how I’d have interpreted the grant. As I understood it, we might consider a community to be a topical area that works with data from digital editions and archives in much the same way and relies on each other’s work for its scholarly discourse, where federated searching would make sense. The problem with imagining the entire TEI community’s prosopography data as part of a single linked open data set is that we don’t always store the same kinds of data in the same fashion—TEI gives us perhaps a little too much freedom for that, although it’s certainly possible to build a kind of digital scaffolding that might communicate by building “cross-walks”. I imagine that most of us are committed by institutions and funding models to concentrate on building our own archives in our own ways, and while we recognize the benefits of federated searching and linked open data, we haven’t project-by-project expressed a practical commitment to it, though we should.
This is something I know that I want to work on in the projects I’m part of. The Digital Mitford project, my most distributed project (shared with participants at multiple institutions with no single institution as its home), has as its backbone and probably most compelling raison d’etre a prosopography list that we’re developing from named entities we pull from letters, poetry, drama, and other literary texts—with historical persons, fictional and archetypal characters, and even named animals as “persons”, as well as places, named events, many kinds of named documents as a set of thousands of entries that serves as the central “nexus” point studying and interlinking the digital editions we’re preparing. In some ways, we’ve probably spent just as much time compiling, pruning, correcting, de-duping, and disambiguating this giant mess of a prosopography “spinal column” as we’ve been able to devote to our TEI representation of manuscripts and published documents. (See for my part http://digitalmitford.org/si.xml ). I’m sure our project isn’t alone in the effort we’ve devoted, and the inconsistencies within it). What’s specifically scary to me about RDF is a kind of commitment that this requires—one’s data, it seems, must ideally be “finished” and reliable to contribute reliably to the LOD world, when we’re always in progress and revising and checking for errors. I’d like to be building the scaffolding to make my data linkable, with the freedom to make repairs—and perhaps that’s the reason I study what I need to do to contribute to RDF and then step back a while until I’ve reliably identified the data that I know is most reliable for that linking.
Were we somehow to incentivize interchangeability, I would caution us against making all projects from TEI to be expressing linkages the same way. There’s a lot of diversity in our code base related to the many different eras and kinds of “documents” (or marked structures, monuments, etc) that we work with. I’d rather see regions and topics emerge that share ways of expressing the linkability of their data—in ways that make it easier for us to work together within, say, our “kinship network” of related projects. I wonder, Richard, if you have advice or thoughts on this!
Elisa Beshero-Bondar, PhD
Director, Center for the Digital Text | Associate Professor of English
University of Pittsburgh at Greensburg | Humanities Division
150 Finoli Drive
Greensburg, PA 15601 USA
Development site: http://newtfire.org
On 25/11/2017 15:20, Elisa Beshero-Bondar wrote:
Dear Richard and list,I think that diversity is a strength, not a problem. The challenge, as I see it, is to establish as much of a shared frame of reference as is feasible and desirable.
In this case, we want enough information about each person to enable us to make a judgement as to whether the 'George Abbot' mentioned by another project is the same person as 'our' George Abbot. Ideally the biographical details will be provided in a machine-processible format, such that software agents can make that judgement (at least on a probabilistic basis). Deathless prose would, however, be a good second best. The commitment to a common framework should, I suggest, be stronger where there is shared interest in the material.
The world of literature, and the people who inhabit it, would be an obvious use case for sharing in this way.
Thanks for the link: I will have a go at processing it when time permits.
There is no need for a death-or-glory leap in the direction of RDF. What I am currently doing is to grab a <listPerson> element as an XML document, and then import its content into our Modes database, converting each <person> element into a free-standing XML record/document in a biographical format. An XSLT transform converts the TEI markup into our 'person' application markup. As and when I will look to publish this resource as Linked Data, another XSLT transform will convert this XML to suitably structured RDF on the fly. That transform can be adjusted over time as and when the requirements for RDF evolve. So the data has only made a transition from one XML format to another.
Were we somehow to incentivize interchangeability, I would caution us against making all projects from TEI to be expressing linkages the same way. There’s a lot of diversity in our code base related to the many different eras and kinds of “documents” (or marked structures, monuments, etc) that we work with. I’d rather see regions and topics emerge that share ways of expressing the linkability of their data—in ways that make it easier for us to work together within, say, our “kinship network” of related projects. I wonder, Richard, if you have advice or thoughts on this!I'm involved with the Linked Pasts project/workgroup (an outgrowth of the Pelagios project), which is aiming to develop an 'interconnection format' for person data to complement the format it is successfully using for historical gazetteers. This works on the assumption that each project will work in its own way, but that there is a core set of data which it is highly desirable to have for the purposes of disambiguating places (or people).
|Free forum by Nabble||Edit this page|