Binary features question

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Binary features question

Torsten Schassan-2
Dear list,

if I want to encode an information about an object which is in general binary, but could as well be unknown, what would be the best way to encode it?

Background: In the new manuscript portal for Germany, http://handschriftenportal.de, we want to supply information about whether a manuscript is illuminated or not and whether it has music notations or not. As the elements <decoNote> and <musicNotation> are to be used for the description of decoration and notations but not meant for or sufficient for stating the shere fact of presence or absence of the two, we thought about having a feature structure in <head> to keep this information:

<fs type="coreFields">
   <f name="status"><symbol value="___KulturObjektDokument:status___"/></f>
   <f name="musicNotation"><binary value="___true|false___"/></f>
   <f name="decoration"><binary value="___true|false___"/></f>
</fs>

(The additional <f name="status"> will keep the status of the manuscript, wether being available, or lost, or destroyed etc.)

But when using <binary>, the value "unknown" is not allowed.

So I've got two questions:

1. Do you think this is an appropriate way to encode this information or do you use an alternative encoding?
2. What would be the best way to allow within <f> the values true, false, and unknown? Would I just use @fVal? (Although I like <binary>, it lacks the "unknown". Could that be subject to a feature request though?)

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss
Reply | Threaded
Open this post in threaded view
|

AW: Binary features question

Grüntgens, Max

Dear Torsten,

first, I (unfortunately) do not have a answer solving the question, but rather an own question to ensure that I understood correctly:

My question: Is it possible that f.e. decoDesc appears in files for manuscripts that are not illuminated (stating maybe the existence of other decorations)?

Because if illumination is a necessary requirement for usage of decoDesc (or a specific typified decoNote), that usage would imho already encode the binary classification mentioned(?).

Maybe the ternary addition of "unknown" might be annotated via @cert? So a decoDesc/Note corresponding to illumination with very low certainty may indicate that it's virtually unknown (I'm currently rather unsure, whether that makes sense...). 

(I, however, guess, that it is not that easy)

Best regards,

Max


Max Grüntgens  

Digitale Akademie

Projekt Propyläen. Forschungsplattform zu Goethes Biographica.

Akademie der Wissenschaften und der Literatur | Mainz

www.goethe-biographica.de

www.digitale-akademie.de
www.adwmainz.de


https://orcid.org/0000-0001-8736-9393


Anschrift:

Digitale Akademie im Frankfurter Goethe-Haus

Freies Deutsches Hochstift

Großer Hirschgraben 23-25

60311 Frankfurt am Main

Tel. +49 69 1 38 80 286
Fax. +49 69 1 38 80 222


Von: TEI (Text Encoding Initiative) public discussion list <[hidden email]> im Auftrag von Torsten Schaßan <[hidden email]>
Gesendet: Freitag, 4. September 2020 10:55:33
An: [hidden email]
Betreff: Binary features question
 
Dear list,

if I want to encode an information about an object which is in general binary, but could as well be unknown, what would be the best way to encode it?

Background: In the new manuscript portal for Germany, http://handschriftenportal.de, we want to supply information about whether a manuscript is illuminated or not and whether it has music notations or not. As the elements <decoNote> and <musicNotation> are to be used for the description of decoration and notations but not meant for or sufficient for stating the shere fact of presence or absence of the two, we thought about having a feature structure in <head> to keep this information:

<fs type="coreFields">
   <f name="status"><symbol value="___KulturObjektDokument:status___"/></f>
   <f name="musicNotation"><binary value="___true|false___"/></f>
   <f name="decoration"><binary value="___true|false___"/></f>
</fs>

(The additional <f name="status"> will keep the status of the manuscript, wether being available, or lost, or destroyed etc.)

But when using <binary>, the value "unknown" is not allowed.

So I've got two questions:

1. Do you think this is an appropriate way to encode this information or do you use an alternative encoding?
2. What would be the best way to allow within <f> the values true, false, and unknown? Would I just use @fVal? (Although I like <binary>, it lacks the "unknown". Could that be subject to a feature request though?)

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss
Reply | Threaded
Open this post in threaded view
|

Re: Binary features question

Jean-Paul Rehr
In reply to this post by Torsten Schassan-2
>But when using <binary>, the value "unknown" is not allowed.

In binary (boolean), "unknown" is effectively equivalent to no value, represented by the absence of a value (nothing, blank, expressed in SQL terms as NULL). A binary field should thus never contain anything but true/false (0/1). Accordingly <tei:binary> accepts no value.

If your preference is for a positive expression of "unknown", I would suggest <string/> with contents "true", "false", "unknown".

As it goes though, I don't see why the need for the extra element when the following is tei compliant:

<f name="musicNotation">true</f>
<f name="decoration">false</f>
<f name="somethingelse">unknown</f>

It seems to me, however, that if these indications are already being encoded via other elements in the tei file, then there is a risk that there could be a mistaken/contradictory entry, ie: <decoNote> exists but someone accidentally enters <f name="decoration">false</f>. My preference would be to have any query look up the existence of <decoNote> to determine the status.


Best,
JPR


On Fri, Sep 4, 2020 at 10:55 AM Torsten Schaßan <[hidden email]> wrote:
Dear list,

if I want to encode an information about an object which is in general binary, but could as well be unknown, what would be the best way to encode it?

Background: In the new manuscript portal for Germany, http://handschriftenportal.de, we want to supply information about whether a manuscript is illuminated or not and whether it has music notations or not. As the elements <decoNote> and <musicNotation> are to be used for the description of decoration and notations but not meant for or sufficient for stating the shere fact of presence or absence of the two, we thought about having a feature structure in <head> to keep this information:

<fs type="coreFields">
   <f name="status"><symbol value="___KulturObjektDokument:status___"/></f>
   <f name="musicNotation"><binary value="___true|false___"/></f>
   <f name="decoration"><binary value="___true|false___"/></f>
</fs>

(The additional <f name="status"> will keep the status of the manuscript, wether being available, or lost, or destroyed etc.)

But when using <binary>, the value "unknown" is not allowed.

So I've got two questions:

1. Do you think this is an appropriate way to encode this information or do you use an alternative encoding?
2. What would be the best way to allow within <f> the values true, false, and unknown? Would I just use @fVal? (Although I like <binary>, it lacks the "unknown". Could that be subject to a feature request though?)

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss
Reply | Threaded
Open this post in threaded view
|

Re: Binary features question

Torsten Schassan-2
Dear Jean-Paul, dear Max, dear all,

thank you for your suggestions.

The reason why I wanted to use something like <binary> is the stronger semantics of the element and the possibility to have a closed list of values as textual content can "only" be controlled via Schematron rules whereas attribute values can be defined in the ODD and subsequently expressed in the schema language(s).

As both of you were referring to <decoDesc> I can only repeat what (in my opinion) is true for the whole TEI: The semantics of the descriptive elements, especially within the msdescription module, is positivistic. Thus it is only possible to describe what's there and not really what isn't. Thus, if I want to describe the absence of decoration it seems awkward to use <decoDesc>.

Thus, my question strived towards explicitness of a statement: The presence of <decoDesc> and/or <musciNotation> would implicitly mean that illumination or music notation is present. The absence couldn't be notated so far.

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss


Von: Jean-Paul Rehr <[hidden email]>
An: <[hidden email]>
Gesendet: 04.09.2020 11:23
Betreff: Re: Binary features question

>But when using <binary>, the value "unknown" is not allowed.

In binary (boolean), "unknown" is effectively equivalent to no value, represented by the absence of a value (nothing, blank, expressed in SQL terms as NULL). A binary field should thus never contain anything but true/false (0/1). Accordingly <tei:binary> accepts no value.

If your preference is for a positive expression of "unknown", I would suggest <string/> with contents "true", "false", "unknown".

As it goes though, I don't see why the need for the extra element when the following is tei compliant:

<f name="musicNotation">true</f>
<f name="decoration">false</f>
<f name="somethingelse">unknown</f>

It seems to me, however, that if these indications are already being encoded via other elements in the tei file, then there is a risk that there could be a mistaken/contradictory entry, ie: <decoNote> exists but someone accidentally enters <f name="decoration">false</f>. My preference would be to have any query look up the existence of <decoNote> to determine the status.


Best,
JPR


On Fri, Sep 4, 2020 at 10:55 AM Torsten Schaßan <[hidden email]> wrote:
Dear list,

if I want to encode an information about an object which is in general binary, but could as well be unknown, what would be the best way to encode it?

Background: In the new manuscript portal for Germany, http://handschriftenportal.de, we want to supply information about whether a manuscript is illuminated or not and whether it has music notations or not. As the elements <decoNote> and <musicNotation> are to be used for the description of decoration and notations but not meant for or sufficient for stating the shere fact of presence or absence of the two, we thought about having a feature structure in <head> to keep this information:

<fs type="coreFields">
   <f name="status"><symbol value="___KulturObjektDokument:status___"/></f>
   <f name="musicNotation"><binary value="___true|false___"/></f>
   <f name="decoration"><binary value="___true|false___"/></f>
</fs>

(The additional <f name="status"> will keep the status of the manuscript, wether being available, or lost, or destroyed etc.)

But when using <binary>, the value "unknown" is not allowed.

So I've got two questions:

1. Do you think this is an appropriate way to encode this information or do you use an alternative encoding?
2. What would be the best way to allow within <f> the values true, false, and unknown? Would I just use @fVal? (Although I like <binary>, it lacks the "unknown". Could that be subject to a feature request though?)

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss
Reply | Threaded
Open this post in threaded view
|

Re: Binary features question

Jean-Paul Rehr
So, you've hit the root of the problem:

>if I want to describe the absence of decoration it seems awkward to use <decoDesc>

As for describing what isn't there, I'm afraid I don't see anything inherently positivistic about the simple existence of an element, much as there is nothing positivistic in a database about the existence of a column in a table. 

Describing absence apart from the element that positively confirms it leads, as I mentioned, to possible contradictory encoding in the same document; to me, in terms of risking data integrity, is a worse fate!

Best,
JPR



On Fri, Sep 4, 2020 at 11:37 AM Torsten Schaßan <[hidden email]> wrote:
Dear Jean-Paul, dear Max, dear all,

thank you for your suggestions.

The reason why I wanted to use something like <binary> is the stronger semantics of the element and the possibility to have a closed list of values as textual content can "only" be controlled via Schematron rules whereas attribute values can be defined in the ODD and subsequently expressed in the schema language(s).

As both of you were referring to <decoDesc> I can only repeat what (in my opinion) is true for the whole TEI: The semantics of the descriptive elements, especially within the msdescription module, is positivistic. Thus it is only possible to describe what's there and not really what isn't. Thus, if I want to describe the absence of decoration it seems awkward to use <decoDesc>.

Thus, my question strived towards explicitness of a statement: The presence of <decoDesc> and/or <musciNotation> would implicitly mean that illumination or music notation is present. The absence couldn't be notated so far.

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss


Von: Jean-Paul Rehr <[hidden email]>
An: <[hidden email]>
Gesendet: 04.09.2020 11:23
Betreff: Re: Binary features question

>But when using <binary>, the value "unknown" is not allowed.

In binary (boolean), "unknown" is effectively equivalent to no value, represented by the absence of a value (nothing, blank, expressed in SQL terms as NULL). A binary field should thus never contain anything but true/false (0/1). Accordingly <tei:binary> accepts no value.

If your preference is for a positive expression of "unknown", I would suggest <string/> with contents "true", "false", "unknown".

As it goes though, I don't see why the need for the extra element when the following is tei compliant:

<f name="musicNotation">true</f>
<f name="decoration">false</f>
<f name="somethingelse">unknown</f>

It seems to me, however, that if these indications are already being encoded via other elements in the tei file, then there is a risk that there could be a mistaken/contradictory entry, ie: <decoNote> exists but someone accidentally enters <f name="decoration">false</f>. My preference would be to have any query look up the existence of <decoNote> to determine the status.


Best,
JPR


On Fri, Sep 4, 2020 at 10:55 AM Torsten Schaßan <[hidden email]> wrote:
Dear list,

if I want to encode an information about an object which is in general binary, but could as well be unknown, what would be the best way to encode it?

Background: In the new manuscript portal for Germany, http://handschriftenportal.de, we want to supply information about whether a manuscript is illuminated or not and whether it has music notations or not. As the elements <decoNote> and <musicNotation> are to be used for the description of decoration and notations but not meant for or sufficient for stating the shere fact of presence or absence of the two, we thought about having a feature structure in <head> to keep this information:

<fs type="coreFields">
   <f name="status"><symbol value="___KulturObjektDokument:status___"/></f>
   <f name="musicNotation"><binary value="___true|false___"/></f>
   <f name="decoration"><binary value="___true|false___"/></f>
</fs>

(The additional <f name="status"> will keep the status of the manuscript, wether being available, or lost, or destroyed etc.)

But when using <binary>, the value "unknown" is not allowed.

So I've got two questions:

1. Do you think this is an appropriate way to encode this information or do you use an alternative encoding?
2. What would be the best way to allow within <f> the values true, false, and unknown? Would I just use @fVal? (Although I like <binary>, it lacks the "unknown". Could that be subject to a feature request though?)

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss
Reply | Threaded
Open this post in threaded view
|

Re: Binary features question

Torsten Schassan-2
Dear Jean-Paul,

I see three possible cases here:

1. I have described a manuscript which is decorated: I use <decoDesc>.
2. I have described a manuscript which is not decorated: Is it enough to not use <decoDesc>?
3. I have described a manuscript but not the decoration: How would you encode that one?

Using an explicit encoding I would supply a feature=true in case 1, a feature=false in case 2 and a feature=unknown in case 3.

According to you analogy of a database: Does an empty element have the same semantics as a non-existing element?

And concerning the possible failing data integrity: If I allow <decoDesc> to describe the absence of decoration, e.g. when there is space left in the manuscript for decoration to be filled in but then it has never been done or if other manuscripts of the same kind usually are decorated but this very one isn't, what about the data integrity then?

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss


Von: Jean-Paul Rehr <[hidden email]>
An: <[hidden email]>
Gesendet: 04.09.2020 11:52
Betreff: Re: Binary features question

So, you've hit the root of the problem:

>if I want to describe the absence of decoration it seems awkward to use <decoDesc>

As for describing what isn't there, I'm afraid I don't see anything inherently positivistic about the simple existence of an element, much as there is nothing positivistic in a database about the existence of a column in a table. 

Describing absence apart from the element that positively confirms it leads, as I mentioned, to possible contradictory encoding in the same document; to me, in terms of risking data integrity, is a worse fate!

Best,
JPR



On Fri, Sep 4, 2020 at 11:37 AM Torsten Schaßan <[hidden email]> wrote:
Dear Jean-Paul, dear Max, dear all,

thank you for your suggestions.

The reason why I wanted to use something like <binary> is the stronger semantics of the element and the possibility to have a closed list of values as textual content can "only" be controlled via Schematron rules whereas attribute values can be defined in the ODD and subsequently expressed in the schema language(s).

As both of you were referring to <decoDesc> I can only repeat what (in my opinion) is true for the whole TEI: The semantics of the descriptive elements, especially within the msdescription module, is positivistic. Thus it is only possible to describe what's there and not really what isn't. Thus, if I want to describe the absence of decoration it seems awkward to use <decoDesc>.

Thus, my question strived towards explicitness of a statement: The presence of <decoDesc> and/or <musciNotation> would implicitly mean that illumination or music notation is present. The absence couldn't be notated so far.

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss


Von: Jean-Paul Rehr <[hidden email]>
An: <[hidden email]>
Gesendet: 04.09.2020 11:23
Betreff: Re: Binary features question

>But when using <binary>, the value "unknown" is not allowed.

In binary (boolean), "unknown" is effectively equivalent to no value, represented by the absence of a value (nothing, blank, expressed in SQL terms as NULL). A binary field should thus never contain anything but true/false (0/1). Accordingly <tei:binary> accepts no value.

If your preference is for a positive expression of "unknown", I would suggest <string/> with contents "true", "false", "unknown".

As it goes though, I don't see why the need for the extra element when the following is tei compliant:

<f name="musicNotation">true</f>
<f name="decoration">false</f>
<f name="somethingelse">unknown</f>

It seems to me, however, that if these indications are already being encoded via other elements in the tei file, then there is a risk that there could be a mistaken/contradictory entry, ie: <decoNote> exists but someone accidentally enters <f name="decoration">false</f>. My preference would be to have any query look up the existence of <decoNote> to determine the status.


Best,
JPR


On Fri, Sep 4, 2020 at 10:55 AM Torsten Schaßan <[hidden email]> wrote:
Dear list,

if I want to encode an information about an object which is in general binary, but could as well be unknown, what would be the best way to encode it?

Background: In the new manuscript portal for Germany, http://handschriftenportal.de, we want to supply information about whether a manuscript is illuminated or not and whether it has music notations or not. As the elements <decoNote> and <musicNotation> are to be used for the description of decoration and notations but not meant for or sufficient for stating the shere fact of presence or absence of the two, we thought about having a feature structure in <head> to keep this information:

<fs type="coreFields">
   <f name="status"><symbol value="___KulturObjektDokument:status___"/></f>
   <f name="musicNotation"><binary value="___true|false___"/></f>
   <f name="decoration"><binary value="___true|false___"/></f>
</fs>

(The additional <f name="status"> will keep the status of the manuscript, wether being available, or lost, or destroyed etc.)

But when using <binary>, the value "unknown" is not allowed.

So I've got two questions:

1. Do you think this is an appropriate way to encode this information or do you use an alternative encoding?
2. What would be the best way to allow within <f> the values true, false, and unknown? Would I just use @fVal? (Although I like <binary>, it lacks the "unknown". Could that be subject to a feature request though?)

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss
Reply | Threaded
Open this post in threaded view
|

Re: Binary features question

Jean-Paul Rehr
Dear Torston,

Admittedly, my background leads me, for better or worse, to think a lot about the upstream and downstream implications of TEI encoding. So what I suggest here is from my view of project lifecycle management, and not just addressing "which field". :)

If we leave aside binary and move to explicit categorization, I would consider doing something like the following:

1. I have described a manuscript which is decorated:
<decoDesc type="decorated">

2. I have described a manuscript which is not decorated:
<decoDesc type="undecorated">

3. I have described a manuscript but not the decoration: 
<decoDesc type="unknown"> (or even more explicitly "not-yet-catalogued")

In the schema, I would force the <decoDesc> to be mandatory with @type, and the  @type values in turn made part of the schema so that it can be validated at any point. This would then be added to the template used to create the docs, with a "blank" value so that the user can be flagged by IDE validation to fill in the field.

Depending on workflow and project scope, I would be able to query at any point the "state" of both encoding and cataloging. For example, periodically look for @type="not-yet-catalogued" to create work lists for further research. 

All of this while recognizing that <decoDesc> cannot be an empty element, itself requiring either p, ab, etc. That to me is another opportunity:  query the contents of the element with @type="decorated" to see if it has been described according to the encoding practices that have been set forth by the project leader. 

I hope this perspective helps more than hinders.

Best,
JPR


On Fri, Sep 4, 2020 at 12:20 PM Torsten Schaßan <[hidden email]> wrote:
Dear Jean-Paul,

I see three possible cases here:

1. I have described a manuscript which is decorated: I use <decoDesc>.
2. I have described a manuscript which is not decorated: Is it enough to not use <decoDesc>?
3. I have described a manuscript but not the decoration: How would you encode that one?

Using an explicit encoding I would supply a feature=true in case 1, a feature=false in case 2 and a feature=unknown in case 3.

According to you analogy of a database: Does an empty element have the same semantics as a non-existing element?

And concerning the possible failing data integrity: If I allow <decoDesc> to describe the absence of decoration, e.g. when there is space left in the manuscript for decoration to be filled in but then it has never been done or if other manuscripts of the same kind usually are decorated but this very one isn't, what about the data integrity then?

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss


Von: Jean-Paul Rehr <[hidden email]>
An: <[hidden email]>
Gesendet: 04.09.2020 11:52
Betreff: Re: Binary features question

So, you've hit the root of the problem:

>if I want to describe the absence of decoration it seems awkward to use <decoDesc>

As for describing what isn't there, I'm afraid I don't see anything inherently positivistic about the simple existence of an element, much as there is nothing positivistic in a database about the existence of a column in a table. 

Describing absence apart from the element that positively confirms it leads, as I mentioned, to possible contradictory encoding in the same document; to me, in terms of risking data integrity, is a worse fate!

Best,
JPR



On Fri, Sep 4, 2020 at 11:37 AM Torsten Schaßan <[hidden email]> wrote:
Dear Jean-Paul, dear Max, dear all,

thank you for your suggestions.

The reason why I wanted to use something like <binary> is the stronger semantics of the element and the possibility to have a closed list of values as textual content can "only" be controlled via Schematron rules whereas attribute values can be defined in the ODD and subsequently expressed in the schema language(s).

As both of you were referring to <decoDesc> I can only repeat what (in my opinion) is true for the whole TEI: The semantics of the descriptive elements, especially within the msdescription module, is positivistic. Thus it is only possible to describe what's there and not really what isn't. Thus, if I want to describe the absence of decoration it seems awkward to use <decoDesc>.

Thus, my question strived towards explicitness of a statement: The presence of <decoDesc> and/or <musciNotation> would implicitly mean that illumination or music notation is present. The absence couldn't be notated so far.

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss


Von: Jean-Paul Rehr <[hidden email]>
An: <[hidden email]>
Gesendet: 04.09.2020 11:23
Betreff: Re: Binary features question

>But when using <binary>, the value "unknown" is not allowed.

In binary (boolean), "unknown" is effectively equivalent to no value, represented by the absence of a value (nothing, blank, expressed in SQL terms as NULL). A binary field should thus never contain anything but true/false (0/1). Accordingly <tei:binary> accepts no value.

If your preference is for a positive expression of "unknown", I would suggest <string/> with contents "true", "false", "unknown".

As it goes though, I don't see why the need for the extra element when the following is tei compliant:

<f name="musicNotation">true</f>
<f name="decoration">false</f>
<f name="somethingelse">unknown</f>

It seems to me, however, that if these indications are already being encoded via other elements in the tei file, then there is a risk that there could be a mistaken/contradictory entry, ie: <decoNote> exists but someone accidentally enters <f name="decoration">false</f>. My preference would be to have any query look up the existence of <decoNote> to determine the status.


Best,
JPR


On Fri, Sep 4, 2020 at 10:55 AM Torsten Schaßan <[hidden email]> wrote:
Dear list,

if I want to encode an information about an object which is in general binary, but could as well be unknown, what would be the best way to encode it?

Background: In the new manuscript portal for Germany, http://handschriftenportal.de, we want to supply information about whether a manuscript is illuminated or not and whether it has music notations or not. As the elements <decoNote> and <musicNotation> are to be used for the description of decoration and notations but not meant for or sufficient for stating the shere fact of presence or absence of the two, we thought about having a feature structure in <head> to keep this information:

<fs type="coreFields">
   <f name="status"><symbol value="___KulturObjektDokument:status___"/></f>
   <f name="musicNotation"><binary value="___true|false___"/></f>
   <f name="decoration"><binary value="___true|false___"/></f>
</fs>

(The additional <f name="status"> will keep the status of the manuscript, wether being available, or lost, or destroyed etc.)

But when using <binary>, the value "unknown" is not allowed.

So I've got two questions:

1. Do you think this is an appropriate way to encode this information or do you use an alternative encoding?
2. What would be the best way to allow within <f> the values true, false, and unknown? Would I just use @fVal? (Although I like <binary>, it lacks the "unknown". Could that be subject to a feature request though?)

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss
Reply | Threaded
Open this post in threaded view
|

Re: Binary features question

James Cummings-5

Hi all,

In terms of the statement made by the existence of a <decoDesc> element, since the description says that it 'contains a description of the decoration of a manuscript or other object' I understand Torsten's feeling that its presence signals that there is decoration in the manuscript. However, for me, that description of the decoration might equally be comments on its lack of existence or as Torsten suggests the spaces left for it that were no filled by an illuminator. I would generally take straightforward the approach suggested by Jean-Paul and classify the types of <decoDesc> based on a convenient typology of some sort. (Another approach might be that in <msDesc>s where there is decoration I would use <decoNote> underneath that <decoDesc> but in ones where it didn't have decoration I might only use <p>. ;-) ) Perhaps the description should be modified to say 'description of the state of decoration' to suggest one could use it for noting a lack of decoration?

The question of whether the existence of an element in a metadata description such as <msDesc> (or <object>) indicates the existence of the feature that element documents is an interesting question. Whereas I might use <decoNote> to record information about the lack of decoration, I would not, for example, use <accMat> to record information about the lack of accompanying material. I recognise this is inconsistent. The presence of an element does not necessarily provide canonical information about the presence or absence of that feature. The absence of an element all-together or the presence of a contentless ('empty') element might, but this depends on the consistency, completeness, and competency of the cataloguing and encoding. Moreover, understanding what you mean by that mark-up (or its lack of existence) pre-supposes that those reading the markup have read and understood your local encoding guidelines stored in your ODD. 

Being a pragmatic person, using @type to classify these in some ways seems the most straightforward to me. It documents it to some degree in the markup even if it pushes the responsibility for doing anything about it to the processing.

Many thanks,

James 


--

Dr James Cummings, [hidden email]
Senior Lecturer in Late-Medieval Literature and Digital Humanities

School of English, Newcastle University


From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Jean-Paul Rehr <[hidden email]>
Sent: 04 September 2020 12:20
To: [hidden email] <[hidden email]>
Subject: Re: Binary features question
 

⚠ External sender. Take care when opening links or attachments. Do not provide your login details.

Dear Torston,

Admittedly, my background leads me, for better or worse, to think a lot about the upstream and downstream implications of TEI encoding. So what I suggest here is from my view of project lifecycle management, and not just addressing "which field". :)

If we leave aside binary and move to explicit categorization, I would consider doing something like the following:

1. I have described a manuscript which is decorated:
<decoDesc type="decorated">

2. I have described a manuscript which is not decorated:
<decoDesc type="undecorated">

3. I have described a manuscript but not the decoration: 
<decoDesc type="unknown"> (or even more explicitly "not-yet-catalogued")

In the schema, I would force the <decoDesc> to be mandatory with @type, and the  @type values in turn made part of the schema so that it can be validated at any point. This would then be added to the template used to create the docs, with a "blank" value so that the user can be flagged by IDE validation to fill in the field.

Depending on workflow and project scope, I would be able to query at any point the "state" of both encoding and cataloging. For example, periodically look for @type="not-yet-catalogued" to create work lists for further research. 

All of this while recognizing that <decoDesc> cannot be an empty element, itself requiring either p, ab, etc. That to me is another opportunity:  query the contents of the element with @type="decorated" to see if it has been described according to the encoding practices that have been set forth by the project leader. 

I hope this perspective helps more than hinders.

Best,
JPR


On Fri, Sep 4, 2020 at 12:20 PM Torsten Schaßan <[hidden email]> wrote:
Dear Jean-Paul,

I see three possible cases here:

1. I have described a manuscript which is decorated: I use <decoDesc>.
2. I have described a manuscript which is not decorated: Is it enough to not use <decoDesc>?
3. I have described a manuscript but not the decoration: How would you encode that one?

Using an explicit encoding I would supply a feature=true in case 1, a feature=false in case 2 and a feature=unknown in case 3.

According to you analogy of a database: Does an empty element have the same semantics as a non-existing element?

And concerning the possible failing data integrity: If I allow <decoDesc> to describe the absence of decoration, e.g. when there is space left in the manuscript for decoration to be filled in but then it has never been done or if other manuscripts of the same kind usually are decorated but this very one isn't, what about the data integrity then?

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss


Von: Jean-Paul Rehr <[hidden email]>
An: <[hidden email]>
Gesendet: 04.09.2020 11:52
Betreff: Re: Binary features question

So, you've hit the root of the problem:

>if I want to describe the absence of decoration it seems awkward to use <decoDesc>

As for describing what isn't there, I'm afraid I don't see anything inherently positivistic about the simple existence of an element, much as there is nothing positivistic in a database about the existence of a column in a table. 

Describing absence apart from the element that positively confirms it leads, as I mentioned, to possible contradictory encoding in the same document; to me, in terms of risking data integrity, is a worse fate!

Best,
JPR



On Fri, Sep 4, 2020 at 11:37 AM Torsten Schaßan <[hidden email]> wrote:
Dear Jean-Paul, dear Max, dear all,

thank you for your suggestions.

The reason why I wanted to use something like <binary> is the stronger semantics of the element and the possibility to have a closed list of values as textual content can "only" be controlled via Schematron rules whereas attribute values can be defined in the ODD and subsequently expressed in the schema language(s).

As both of you were referring to <decoDesc> I can only repeat what (in my opinion) is true for the whole TEI: The semantics of the descriptive elements, especially within the msdescription module, is positivistic. Thus it is only possible to describe what's there and not really what isn't. Thus, if I want to describe the absence of decoration it seems awkward to use <decoDesc>.

Thus, my question strived towards explicitness of a statement: The presence of <decoDesc> and/or <musciNotation> would implicitly mean that illumination or music notation is present. The absence couldn't be notated so far.

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss


Von: Jean-Paul Rehr <[hidden email]>
An: <[hidden email]>
Gesendet: 04.09.2020 11:23
Betreff: Re: Binary features question

>But when using <binary>, the value "unknown" is not allowed.

In binary (boolean), "unknown" is effectively equivalent to no value, represented by the absence of a value (nothing, blank, expressed in SQL terms as NULL). A binary field should thus never contain anything but true/false (0/1). Accordingly <tei:binary> accepts no value.

If your preference is for a positive expression of "unknown", I would suggest <string/> with contents "true", "false", "unknown".

As it goes though, I don't see why the need for the extra element when the following is tei compliant:

<f name="musicNotation">true</f>
<f name="decoration">false</f>
<f name="somethingelse">unknown</f>

It seems to me, however, that if these indications are already being encoded via other elements in the tei file, then there is a risk that there could be a mistaken/contradictory entry, ie: <decoNote> exists but someone accidentally enters <f name="decoration">false</f>. My preference would be to have any query look up the existence of <decoNote> to determine the status.


Best,
JPR


On Fri, Sep 4, 2020 at 10:55 AM Torsten Schaßan <[hidden email]> wrote:
Dear list,

if I want to encode an information about an object which is in general binary, but could as well be unknown, what would be the best way to encode it?

Background: In the new manuscript portal for Germany, http://handschriftenportal.de, we want to supply information about whether a manuscript is illuminated or not and whether it has music notations or not. As the elements <decoNote> and <musicNotation> are to be used for the description of decoration and notations but not meant for or sufficient for stating the shere fact of presence or absence of the two, we thought about having a feature structure in <head> to keep this information:

<fs type="coreFields">
   <f name="status"><symbol value="___KulturObjektDokument:status___"/></f>
   <f name="musicNotation"><binary value="___true|false___"/></f>
   <f name="decoration"><binary value="___true|false___"/></f>
</fs>

(The additional <f name="status"> will keep the status of the manuscript, wether being available, or lost, or destroyed etc.)

But when using <binary>, the value "unknown" is not allowed.

So I've got two questions:

1. Do you think this is an appropriate way to encode this information or do you use an alternative encoding?
2. What would be the best way to allow within <f> the values true, false, and unknown? Would I just use @fVal? (Although I like <binary>, it lacks the "unknown". Could that be subject to a feature request though?)

Mit besten Grüßen,
Torsten

-- 
Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen
Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165
Handschriftendatenbank: http://diglib.hab.de/?db=mss