Place(s) for idno[@type="DOI"]

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Place(s) for idno[@type="DOI"]

Andreas Wagner
Dear list,

I am currently putting finishing touches on (v0.2 of) a tool I have been
writing. The tool is meant to be run as a webservice ingesting TEI XML
files, extracting metadata and putting the file as a new deposit to
zenodo [1], a free research data repository.

Zenodo provides DOIs to its deposits and the webservice can acquire
this DOI and insert it into the TEI file before submitting it. I would
like to make sure I am not missing something obvious when I do this in
the following way:

- if there are no <idno> children of
   /TEI/teiHeader/fileDesc/publicationStmt yet, add a <idno
   type="DOI">{new zenodo doi}</idno> element as the last child of
   <publicationsStmt>.

- if /TEI/teiHeader/fileDesc/publicationStmt *does* have <idno> children
   and the first such child *does not* contain <idno> children in turn,
   add a <idno type="DOI">{new zenodo doi}</idno> element as the new
   first <idno> sibling of the publicationStmt/idno elements.

- if the first idno element *does* contain at least one <idno> child in
   turn, add <idno type="DOI">{new zenodo doi}</idno> element as the new
   last <idno> sibling of the "second-level" publicationStmt/idno/idno
   elements.

(An eventual old <idno type="DOI">{some other doi}</idno> element is
removed if it is found at
/TEI/teiHeader/fileDesc/publicationStmt/idno[@type='DOI'] or
/TEI/teiHeader/fileDesc/publicationStmt/idno[1]/idno[@type='DOI'].  
Actually, only the first element found with such tests is removed.)

Obviously, there are some traps set up by the way much of this is
depending on things being at the first or last place (or occurring only
once), and this is something I plan to fix anyway.

But would you recommend writing those <idno> elements to completely
other places? Or handle old DOI information differently? What's the
policy on multiple DOI identifiers anyway, would you consider them a
problem?

It would be an immense help to get some feedback on whether I am
generally working in a good direction or not.


Thanks a lot,

Andreas


[1] https://about.zenodo.org/

PS. For what it's worth, if you want to have a look at the current state
of affairs, it should be publicly accesible at https://gitlab.gwdg.de/rg-mpg-de/tei2zenodo


--
Dr. Andreas Wagner                          twitter: @anwagnerdreas
Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
and Institute of Philosophy                 fax: +49 (0)69/798-32794
Goethe University Frankfurt

IGF HP 25 / R 2.455
Norbert-Wollheim-Platz 1
60629 Frankfurt am Main
Reply | Threaded
Open this post in threaded view
|

Re: Place(s) for idno[@type="DOI"]

Peter Boot-3
Dear Andreas,

Just one quick thought: I'm not sure I would like a file that I upload to be modified by the uploading process. For one thing: it would create a difference between the file on Zenodo and the file as it is on our version control system, making it impossible to establish the integrity of the file on Zenodo. And TEI happens to have a facility for recording IDNOs, but most filetypes of course do not. You wouldn't expect to find the DOI in an uploaded spreadsheet, I assume.  

Best,
Peter

________________________________________
Van: TEI (Text Encoding Initiative) public discussion list <[hidden email]> namens Andreas Wagner <[hidden email]>
Verzonden: woensdag 10 juni 2020 17:06
Aan: [hidden email]
Onderwerp: [TEI-L] Place(s) for idno[@type="DOI"]

Dear list,

I am currently putting finishing touches on (v0.2 of) a tool I have been
writing. The tool is meant to be run as a webservice ingesting TEI XML
files, extracting metadata and putting the file as a new deposit to
zenodo [1], a free research data repository.

Zenodo provides DOIs to its deposits and the webservice can acquire
this DOI and insert it into the TEI file before submitting it. I would
like to make sure I am not missing something obvious when I do this in
the following way:

- if there are no <idno> children of
   /TEI/teiHeader/fileDesc/publicationStmt yet, add a <idno
   type="DOI">{new zenodo doi}</idno> element as the last child of
   <publicationsStmt>.

- if /TEI/teiHeader/fileDesc/publicationStmt *does* have <idno> children
   and the first such child *does not* contain <idno> children in turn,
   add a <idno type="DOI">{new zenodo doi}</idno> element as the new
   first <idno> sibling of the publicationStmt/idno elements.

- if the first idno element *does* contain at least one <idno> child in
   turn, add <idno type="DOI">{new zenodo doi}</idno> element as the new
   last <idno> sibling of the "second-level" publicationStmt/idno/idno
   elements.

(An eventual old <idno type="DOI">{some other doi}</idno> element is
removed if it is found at
/TEI/teiHeader/fileDesc/publicationStmt/idno[@type='DOI'] or
/TEI/teiHeader/fileDesc/publicationStmt/idno[1]/idno[@type='DOI'].
Actually, only the first element found with such tests is removed.)

Obviously, there are some traps set up by the way much of this is
depending on things being at the first or last place (or occurring only
once), and this is something I plan to fix anyway.

But would you recommend writing those <idno> elements to completely
other places? Or handle old DOI information differently? What's the
policy on multiple DOI identifiers anyway, would you consider them a
problem?

It would be an immense help to get some feedback on whether I am
generally working in a good direction or not.


Thanks a lot,

Andreas


[1] https://about.zenodo.org/

PS. For what it's worth, if you want to have a look at the current state
of affairs, it should be publicly accesible at https://gitlab.gwdg.de/rg-mpg-de/tei2zenodo


--
Dr. Andreas Wagner                          twitter: @anwagnerdreas
Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
and Institute of Philosophy                 fax: +49 (0)69/798-32794
Goethe University Frankfurt

IGF HP 25 / R 2.455
Norbert-Wollheim-Platz 1
60629 Frankfurt am Main
lou
Reply | Threaded
Open this post in threaded view
|

Re: Place(s) for idno[@type="DOI"]

lou
FWIW, when an ELTeC repo gets published to zenodo, its <publicationStmt> gets rewritten like this:
~~~~
 <publicationStmt>
        <publisher ref="https://distant-reading.net">COST Action "Distant Reading for European Literary History" (CA16204)</publisher>
        <distributor ref="https://zenodo.org/communities/eltec/">Zenodo.org</distributor>
        <date when="2019-11-04" />
        <availability>
          <licence target="https://creativecommons.org/licenses/by/4.0/" />
        </availability>
        <ref type="doi" target="https://doi.org/10.5281/zenodo.3462435" />
      </publicationStmt>
~~~~
We chose to use <ref> rather than <idno>, largely out of reluctance to add another new element to our schema.
For me, the real problem is that we don't know what the DOI is going to be before we deposit the document, so we can't include the DOI in it till after it's been deposited. Which means that when we make the change to include the DOI, it's not the same document as the one in the archive any more, so we should redeposit it.... Hmmm. Catch22....



On Wed, 10 Jun 2020 at 16:38, Peter Boot <[hidden email]> wrote:
Dear Andreas,

Just one quick thought: I'm not sure I would like a file that I upload to be modified by the uploading process. For one thing: it would create a difference between the file on Zenodo and the file as it is on our version control system, making it impossible to establish the integrity of the file on Zenodo. And TEI happens to have a facility for recording IDNOs, but most filetypes of course do not. You wouldn't expect to find the DOI in an uploaded spreadsheet, I assume. 

Best,
Peter

________________________________________
Van: TEI (Text Encoding Initiative) public discussion list <[hidden email]> namens Andreas Wagner <[hidden email]>
Verzonden: woensdag 10 juni 2020 17:06
Aan: [hidden email]
Onderwerp: [TEI-L] Place(s) for idno[@type="DOI"]

Dear list,

I am currently putting finishing touches on (v0.2 of) a tool I have been
writing. The tool is meant to be run as a webservice ingesting TEI XML
files, extracting metadata and putting the file as a new deposit to
zenodo [1], a free research data repository.

Zenodo provides DOIs to its deposits and the webservice can acquire
this DOI and insert it into the TEI file before submitting it. I would
like to make sure I am not missing something obvious when I do this in
the following way:

- if there are no <idno> children of
   /TEI/teiHeader/fileDesc/publicationStmt yet, add a <idno
   type="DOI">{new zenodo doi}</idno> element as the last child of
   <publicationsStmt>.

- if /TEI/teiHeader/fileDesc/publicationStmt *does* have <idno> children
   and the first such child *does not* contain <idno> children in turn,
   add a <idno type="DOI">{new zenodo doi}</idno> element as the new
   first <idno> sibling of the publicationStmt/idno elements.

- if the first idno element *does* contain at least one <idno> child in
   turn, add <idno type="DOI">{new zenodo doi}</idno> element as the new
   last <idno> sibling of the "second-level" publicationStmt/idno/idno
   elements.

(An eventual old <idno type="DOI">{some other doi}</idno> element is
removed if it is found at
/TEI/teiHeader/fileDesc/publicationStmt/idno[@type='DOI'] or
/TEI/teiHeader/fileDesc/publicationStmt/idno[1]/idno[@type='DOI'].
Actually, only the first element found with such tests is removed.)

Obviously, there are some traps set up by the way much of this is
depending on things being at the first or last place (or occurring only
once), and this is something I plan to fix anyway.

But would you recommend writing those <idno> elements to completely
other places? Or handle old DOI information differently? What's the
policy on multiple DOI identifiers anyway, would you consider them a
problem?

It would be an immense help to get some feedback on whether I am
generally working in a good direction or not.


Thanks a lot,

Andreas


[1] https://about.zenodo.org/

PS. For what it's worth, if you want to have a look at the current state
of affairs, it should be publicly accesible at https://gitlab.gwdg.de/rg-mpg-de/tei2zenodo


--
Dr. Andreas Wagner                          twitter: @anwagnerdreas
Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
and Institute of Philosophy                 fax: +49 (0)69/798-32794
Goethe University Frankfurt

IGF HP 25 / R 2.455
Norbert-Wollheim-Platz 1
60629 Frankfurt am Main
Reply | Threaded
Open this post in threaded view
|

Re: Place(s) for idno[@type="DOI"]

Andreas Wagner
Dear Peter, Lou, and list,

* Lou Burnard dixit [2020-06-10 17:54]:
> For me, the real problem is that we don't know what the DOI is going
> to be before we deposit the document, so we can't include the DOI in
> it till after it's been deposited. Which means that when we make the
> change to include the DOI, it's not the same document as the one in
> the archive any more, so we should redeposit it.... Hmmm. Catch22....

The zenodo API allows to acquire a DOI before uploading, which I am
doing. I create a "blank" deposit and a DOI is reserved. I add this DOI
to my TEI file and then add the file and the metadata to the deposit.  
Finally, I can decide to publish the deposit or leave it in the
unpublished state and finish whatever I want by logging in to zenodo
with a browser, and going to my "Uploads"...

Also, the way I imagine this is that the service is called by a github
webhook whenever there is a push to the repository. I am filtering on
patterns in repo/branch/pusher/commit msg and then, if applicable,
retrieve the files affected by the push from github. When all is done, I
add the file - with the new DOI - back to github so that the most recent
version there is the same one as the one on zenodo. (And the service of
course ignores the push webhook called by this.)

All of this works already, although it has not been thoroughly tested
yet

Best,

Andreas


--
Dr. Andreas Wagner                          twitter: @anwagnerdreas
Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
and Institute of Philosophy                 fax: +49 (0)69/798-32794
Goethe University Frankfurt

IGF HP 25 / R 2.455
Norbert-Wollheim-Platz 1
60629 Frankfurt am Main
Reply | Threaded
Open this post in threaded view
|

Re: Place(s) for idno[@type="DOI"]

Andreas Wagner
In reply to this post by lou
Dear Lou and list,

* Lou Burnard dixit [2020-06-10 17:54]:
>~~~~
> <publicationStmt>
>   <ref type="doi" target="https://doi.org/10.5281/zenodo.3462435" />
> </publicationStmt>
>~~~~
> We chose to use <ref> rather than <idno>, largely out of reluctance to
> add another new element to our schema.

That's a good point and seems -- if it reflects an at least somewhat
common usage -- to indicate that it's not possible to anticipate (nor
desirable to enforce) a canonical place for the DOI.

I already have a user configuration based on XPaths in place in order to
retrieve the metadata that zenodo requires, so maybe I will have to use
a mechanism like that also to manage the DOI insertion/update.

Best,

Andreas


--
Dr. Andreas Wagner                          twitter: @anwagnerdreas
Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
and Institute of Philosophy                 fax: +49 (0)69/798-32794
Goethe University Frankfurt

IGF HP 25 / R 2.455
Norbert-Wollheim-Platz 1
60629 Frankfurt am Main
Reply | Threaded
Open this post in threaded view
|

Re: Place(s) for idno[@type="DOI"]

Mylonas, Elli
Dear all - This sounds really nice. 
We intend to store our document level DOI in the publicationStmt/idno/@type="DOI". So agree on location. However, it's possible that a file already has a DOI. Is there any reason not to keep an existing DOI? Adding the new one is not a bad idea, as it inserts the file identifier into the file metadata which is a recommended practice for Findability. 

does it make sense to add another qualifier? So as to differentiate Zenodo DOI vs. myRepo DOI? I know it's not great to have 2 DOIs, but it does happen. 

best --elli
[Elli Mylonas
 Center for Digital Scholarship
 University Library
 Brown University
 library.brown.edu/cds
(she, her, hers)]

On Wed, Jun 10, 2020 at 2:11 PM Andreas Wagner <[hidden email]> wrote:
Dear Lou and list,

* Lou Burnard dixit [2020-06-10 17:54]:
>~~~~
> <publicationStmt>
>   <ref type="doi" target="https://doi.org/10.5281/zenodo.3462435" />
> </publicationStmt>
>~~~~
> We chose to use <ref> rather than <idno>, largely out of reluctance to
> add another new element to our schema.

That's a good point and seems -- if it reflects an at least somewhat
common usage -- to indicate that it's not possible to anticipate (nor
desirable to enforce) a canonical place for the DOI.

I already have a user configuration based on XPaths in place in order to
retrieve the metadata that zenodo requires, so maybe I will have to use
a mechanism like that also to manage the DOI insertion/update.

Best,

Andreas


--
Dr. Andreas Wagner                          twitter: @anwagnerdreas
Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
and Institute of Philosophy                 fax: +49 (0)69/798-32794
Goethe University Frankfurt

IGF HP 25 / R 2.455
Norbert-Wollheim-Platz 1
60629 Frankfurt am Main
Reply | Threaded
Open this post in threaded view
|

Re: Place(s) for idno[@type="DOI"]

Christof Schöch
In reply to this post by Andreas Wagner
Dear Andreas, dear list,

Thanks, Andreas, for presenting this service. I really like the idea of maing Zenodo's feature of pre-reserving a DOI available in your webservice using the Zenodo API. It opens the door to all kinds of activities involving mass-upload of XML-TEI files to Zenodo using the "right" DOI.

I'm thinking, for example, of papers in conference proceedings as a possible use case. In that case, what if additional files are associated with the XML-TEI file, such as images or a PDF rendered from the XML-TEI and the images, or a BibTex file? Could everything be packed into a ZIP archive and then be uploaded to the deposit with the right DOI?

The second use case I am wondering about is the one Lou mentioned, the ELTeC collections we produce in the "Distant Reading" COST Action. I'm afraid this leads us away a bit from purely TEI concerns, I hope this is ok. Here, the level of granularity is a bit different. At the moment, we use the (brilliant!) Github-Zenodo connection feature that automatically generates a new version of a collection's deposit (and new DOI) on Zenodo every time we make a new release of a collection. (We do not generate a new Zenodo deposit of each file every time there is a new push; and I admit that I would find this a bit excessive, in our context.)

Now the problem is that the pre-reserved DOI feature is not available when using the Github-Zenodo deposit-on-release feature. What we have done so far to address this is take advantage of the fact that DOIs on Zenodo are "versioned", so to speak. So there is one overarching DOI for all versions of the deposit. This means we make a dummy-deposit to get the overarching DOI. We can add this DOI to the README of the ELTeC collection and, if we feel like doing so, to each individual XML-TEI file inside that collection before we make any real release. However, this DOI will not be identical to the "versioned" DOI of the deposit once we make another release.

So the big question is: could a tool such as yours, using the Zenodo API, intervene in the Github-Zenodo deposit-on-release feature and pre-reserve a DOI for the next release? This could then be included in the README and the files just before making the release.  

Best wishes all around,
Christof
Reply | Threaded
Open this post in threaded view
|

Re: Place(s) for idno[@type="DOI"]

Andreas Wagner
Dear Christof, dear all,

my intention was to have one deposit/DOI per TEI file, that was why I was looking for an alternative to the tried and trusted github-zenodo-integration in the first place. And I am using the "push" event notification because it allows to learn about the individual files that have changed more easily.

Dealing with multiple files that belong together, like - our use case - multivolume works, linked with XIncludes, is something I have on my todo list for a future version. If such files are linked in other ways, some details will need to be clarified. (Is there a manifest of sorts?) Again, and to return to the initial subject, one of the questions would be: where should the new DOI be written to?, where should I check for a DOI that indicates whether the present set of files is an update to an already existing zenodo deposit? etc.

Maybe it's better to discuss such details off-list? Since I don't have much of a preference wrt how this can happen, suggestions are very welcome also in this respect. (What is always an option is issues in the tool's repository: https://gitlab.gwdg.de/rg-mpg-de/tei2zenodo)

I am travelling this weekend and this is a side-project anyway, but I hope I will find a way to respond to the question mentioned in the OP, and then announce a release with the current feature set soon. Thanks for all your feedback so far.

Andreas

--
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.