Automatically anonymize personal data

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Automatically anonymize personal data

Wingerath, Markus
Dear list,

I'm currently working on a new schema which would be used by various archives to mark-up some of their sources for digital and print publishing. The digital editions will be part of a relaunched portal archive.nrw.de.
My own project focuses on publishing the protocols of the 9th Cabinet session of the State government of Northrhine-Westfalia dating from 1980-1985.
Within the documents there is some personal data which in accordance with German law has to be anonymized and I would like this to be done automatically while being processed by any XML-Parser before print or digital publishing. Anonymization can be both, either to replace a persons name with the initials or to substitute a whole paragraph or any other structural text element with an alternative text. Ideally the mark-up is also datable, to specify a date after which the original text can be displayed. (depending on the legal background this could be 30 yrs after formation of a document, 10 yrs after a persons death or 110 yrs after a persons birth).
I initially thought about the <choice>-Element to do this, but I'm using this to differentiate between a diplomatic and a corrected representation of the source already and I would like to use an extra element to minimize errors either by myself or other encoders.

Is there any way/any element within the TEI-framework for this kind of job or should I define a new element tag?

Kind regards,
M. Wingerath

Landesarchiv Nordrhein-Westfalen
Abteilung Rheinland
Schifferstraße 30
47059 Duisburg
Tel.: +49 203 98721-427
Fax: +49 203 98721-111
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatically anonymize personal data

Torsten Schassan-2
Dear Markus,

some ideas about a setting that would do the trick:

a) Your documents are described (using msDesc, to the moment) on the
meta level and dated properly, addressing the document as a whole.

b) All passages that could be subject to anonymisation are marked up and
the markup references the person speaking.

c) All entitites mentioned that could be subject to anonymisation are
marked up and the markup references the respective entity. The
description of the entity (such as the person speaking as in b) contains
dates of birth and death or other dates that would allow to distinguish
between data to be processed and data to be anonymised.


All the anonymisations and substitutions are done during the processing,
using the dates in the description of the documents and the dates found
in the prosopography.

You would only have to make sure to process the files in any case,
applying rules depending on the legal status of the user, of the time
elapsed etc.

The dating of the markup is of no interest then.

Best, Torsten


Am 29.06.2017 um 15:36 schrieb Wingerath, Markus:

> Dear list,
>
> I'm currently working on a new schema which would be used by various archives to mark-up some of their sources for digital and print publishing. The digital editions will be part of a relaunched portal archive.nrw.de.
> My own project focuses on publishing the protocols of the 9th Cabinet session of the State government of Northrhine-Westfalia dating from 1980-1985.
> Within the documents there is some personal data which in accordance with German law has to be anonymized and I would like this to be done automatically while being processed by any XML-Parser before print or digital publishing. Anonymization can be both, either to replace a persons name with the initials or to substitute a whole paragraph or any other structural text element with an alternative text. Ideally the mark-up is also datable, to specify a date after which the original text can be displayed. (depending on the legal background this could be 30 yrs after formation of a document, 10 yrs after a persons death or 110 yrs after a persons birth).
> I initially thought about the <choice>-Element to do this, but I'm using this to differentiate between a diplomatic and a corrected representation of the source already and I would like to use an extra element to minimize errors either by myself or other encoders.
>
> Is there any way/any element within the TEI-framework for this kind of job or should I define a new element tag?
>
> Kind regards,
> M. Wingerath
>
> Landesarchiv Nordrhein-Westfalen
> Abteilung Rheinland
> Schifferstraße 30
> 47059 Duisburg
> Tel.: +49 203 98721-427
> Fax: +49 203 98721-111
>


--
Torsten Schassan - Digitale Editionen, Abteilung Handschriften und
Sondersammlungen
Herzog August Bibliothek, Postfach 1364, D-38299 Wolfenbuettel, Tel.:
+49-5331-808-130 (Fax -165)
Handschriftendatenbank* http://diglib.hab.de/?db=mss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatically anonymize personal data

David Maus
In reply to this post by Wingerath, Markus
On Thu, 29 Jun 2017 15:36:45 +0200,
Wingerath, Markus wrote:
>
> Dear list,
>
> I'm currently working on a new schema which would be used by various archives to mark-up some of their sources for digital and print publishing. The digital editions will be part of a relaunched portal archive.nrw.de.
> My own project focuses on publishing the protocols of the 9th Cabinet session of the State government of Northrhine-Westfalia dating from 1980-1985.
> Within the documents there is some personal data which in accordance with German law has to be anonymized and I would like this to be done automatically while being processed by any XML-Parser before print or digital publishing. Anonymization can be both, either to replace a persons name with the initials or to substitute a whole paragraph or any other structural text element with an alternative text. Ideally the mark-up is also datable, to specify a date after which the original text can be displayed. (depending on the legal background this could be 30 yrs after formation of a document, 10 yrs after a persons death or 110 yrs after a persons birth).
> I initially thought about the <choice>-Element to do this, but I'm using this to differentiate between a diplomatic and a corrected representation of the source already and I would like to use an extra element to minimize errors either by myself or other encoders.
>
> Is there any way/any element within the TEI-framework for this kind of job or should I define a new element tag?

First I would recommend to *not* reuse TEI elements or attributes for
this purpose. From my understanding of the problem you need to encode
a rather complex legal framework on top of the structural/semantic
encoding of the source material. As law changes you might also be
required to adjust the framework and re-encode the "legal layer".

Using custom elements might cause a problem because they need to "fit"
into the source document's structure. I would use attributes in a
custom namespace that encoders put on the respective elements of the
source document.

You might also take a look at processing instructions.

Processing-wise I would recommend to use a dedicated pre-processing
step that performs the anonymization on the source document. The
result of this step is fed to the print/publishing steps.

You should also ask at the XML Dev mailinglist [XMLDEV]. I think it's
safe to assumed that you are not the first ones facing this challenge
and maybe some hoopy froods over @xmldev can share some experiences.

HTH,
  -- David

[XMLDEV] http://www.xml.org/xml-dev

>
> Kind regards,
> M. Wingerath
>
> Landesarchiv Nordrhein-Westfalen
> Abteilung Rheinland
> Schifferstraße 30
> 47059 Duisburg
> Tel.: +49 203 98721-427
> Fax: +49 203 98721-111

--
David Maus, Bibliothekarische IT / Digital Humanities
Herzog August Bibliothek, D-38299 Wolfenbüttel, Phone +49 5331 808-317
PGP Key 0x27023DFCE78FF66C
http://dmaus.name ~ http://github.com/dmj
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatically anonymize personal data

David Maus
In reply to this post by Wingerath, Markus
On Thu, 29 Jun 2017 15:36:45 +0200,
Wingerath, Markus wrote:
>
> Dear list,

Addendum: If you find a workable solution I'd love to hear about it on
the XML Prague [PRAGUE] or XML London [LONDON] conferences.

All the best,
  -- David

[PRAGUE] http://www.xmlprague.cz/
[LONDON] http://www.xmllondon.com

--
David Maus, Bibliothekarische IT / Digital Humanities
Herzog August Bibliothek, D-38299 Wolfenbüttel, Phone +49 5331 808-317
PGP Key 0x27023DFCE78FF66C
http://dmaus.name ~ http://github.com/dmj
Loading...