abstract layer instead of tei:p

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

abstract layer instead of tei:p

Mathias Göbel

Dear list,

given a speech like

Joanna Doe: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.

what is usually encoded like

<tei:sp>
  <tei:speaker>Joanna Doe</tei:speaker>
  <tei:p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.</tei:p>
</tei:sp>

Now i like to omit the text passage within tei:sp and publish a rather abstract layer with amounts of spoken words or sentences.

What would you use to encode this?

Would it be something like

<tei:sp>
  <tei:speaker>Joanna Doe</tei:speaker>
  <tei:gap extent="1" unit="sentences"/>
  <tei:gap extent="24" unit="words"/>
  <tei:gap extent="155" unit="chars"/>
</tei:sp>

Maybe i am wrong, but i don't think multiple tei:gap are useful here, because the reason i like to omit the text refers to a single passage and not multiple passages. But using a single tei:gap is impossible because of the various features i like to preserve. Within the teiHeader there is tei:extent and tei:measure, but they are not available in tei:sp.

Best,
Mathias

Reply | Threaded
Open this post in threaded view
|

Re: abstract layer instead of tei:p

Lou Burnard-6
Multiple gap elements inside an SP would indeed imply multiple consecutive omissions, which is clearly not what you want here. The same would apply to multiple DEL elements.  But since your markup is not intended to capture the content of the speech at all, I wonder whether this approach is appropriate in any case. An SP element with no content is arguably not semantically conformant, since the SP asserts that there is some transcribed "spoken text" here: that's why an entirely empty SP element is invalid.  Maybe you would do better to create your own empty element instead?

<mg:spSize sentences="1" words="24" chars="155"/>

Declare that in your ODD, and add it to an appropriate class (I suggest model.global, or model.stageLike) and you're done


  On 21/06/18 10:34, Mathias Göbel wrote:

Dear list,

given a speech like

Joanna Doe: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.

what is usually encoded like

<tei:sp>
  <tei:speaker>Joanna Doe</tei:speaker>
  <tei:p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.</tei:p>
</tei:sp>

Now i like to omit the text passage within tei:sp and publish a rather abstract layer with amounts of spoken words or sentences.

What would you use to encode this?

Would it be something like

<tei:sp>
  <tei:speaker>Joanna Doe</tei:speaker>
  <tei:gap extent="1" unit="sentences"/>
  <tei:gap extent="24" unit="words"/>
  <tei:gap extent="155" unit="chars"/>
</tei:sp>

Maybe i am wrong, but i don't think multiple tei:gap are useful here, because the reason i like to omit the text refers to a single passage and not multiple passages. But using a single tei:gap is impossible because of the various features i like to preserve. Within the teiHeader there is tei:extent and tei:measure, but they are not available in tei:sp.

Best,
Mathias


Reply | Threaded
Open this post in threaded view
|

Re: abstract layer instead of tei:p

Mathias Göbel

I am about to prepare a new corpus for an existing pipeline and because of copyright restrictions i can not publish the full text, but i want to preserve some features of the text. Using a custom element is possible, but at least the pipeline i want to use will query for tei:sp/tei:speaker/@who. As specified by the software, I have to use these elements while struggling with TEI conformance.

When any tei:sp without tei:l or tei:p is semantically invalid, i can not do this according to TEI and i will do as you suggest. What about a tei:note? Within a tei:note a tei:measure is allowed. Unfortunately I will remain with no "primary" content in tei:sp. :-(

There is one rather strange idea left: replacing the text with a dummy that shares the same features.


On 21.06.2018 12:42, Lou Burnard wrote:
Multiple gap elements inside an SP would indeed imply multiple consecutive omissions, which is clearly not what you want here. The same would apply to multiple DEL elements.  But since your markup is not intended to capture the content of the speech at all, I wonder whether this approach is appropriate in any case. An SP element with no content is arguably not semantically conformant, since the SP asserts that there is some transcribed "spoken text" here: that's why an entirely empty SP element is invalid.  Maybe you would do better to create your own empty element instead?

<mg:spSize sentences="1" words="24" chars="155"/>

Declare that in your ODD, and add it to an appropriate class (I suggest model.global, or model.stageLike) and you're done


  On 21/06/18 10:34, Mathias Göbel wrote:

Dear list,

given a speech like

Joanna Doe: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.

what is usually encoded like

<tei:sp>
  <tei:speaker>Joanna Doe</tei:speaker>
  <tei:p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.</tei:p>
</tei:sp>

Now i like to omit the text passage within tei:sp and publish a rather abstract layer with amounts of spoken words or sentences.

What would you use to encode this?

Would it be something like

<tei:sp>
  <tei:speaker>Joanna Doe</tei:speaker>
  <tei:gap extent="1" unit="sentences"/>
  <tei:gap extent="24" unit="words"/>
  <tei:gap extent="155" unit="chars"/>
</tei:sp>

Maybe i am wrong, but i don't think multiple tei:gap are useful here, because the reason i like to omit the text refers to a single passage and not multiple passages. But using a single tei:gap is impossible because of the various features i like to preserve. Within the teiHeader there is tei:extent and tei:measure, but they are not available in tei:sp.

Best,
Mathias



--
Mathias Göbel
University of Göttingen
Göttingen State and University Library
D-37070 Göttingen

Research and Development
Papendiek 14 (hist. Building, Room 2.408)
+49 551 39-20184 (Tel., Wednesday to Friday)
+49 551 39-33856 (Fax.)

Digital Library
Software and Service Development
Platz der Göttinger Sieben 1 (Central Library, Room 2.129)
+49 551 39-10230 (Tel., Monday/Tuesday)

[hidden email]
http://www.sub.uni-goettingen.de
Reply | Threaded
Open this post in threaded view
|

Re: abstract layer instead of tei:p

Lou Burnard-6
Yes NOTE would also be possible, though less useful if you are likely to encounter actual notes in your text, of course.

<note type="counts" resp="MG">Contains <measure commodity="words" quantity="24">24 words</measure>
in <measure commodity="sentences" quantity="1">one sentence</measure> using <measure commodity="chars">155 characters</measure>
</note>

Looks a bit silly, but it works!



On 21/06/18 13:19, Mathias Göbel wrote:

I am about to prepare a new corpus for an existing pipeline and because of copyright restrictions i can not publish the full text, but i want to preserve some features of the text. Using a custom element is possible, but at least the pipeline i want to use will query for tei:sp/tei:speaker/@who. As specified by the software, I have to use these elements while struggling with TEI conformance.

When any tei:sp without tei:l or tei:p is semantically invalid, i can not do this according to TEI and i will do as you suggest. What about a tei:note? Within a tei:note a tei:measure is allowed. Unfortunately I will remain with no "primary" content in tei:sp. :-(

There is one rather strange idea left: replacing the text with a dummy that shares the same features.


On 21.06.2018 12:42, Lou Burnard wrote:
Multiple gap elements inside an SP would indeed imply multiple consecutive omissions, which is clearly not what you want here. The same would apply to multiple DEL elements.  But since your markup is not intended to capture the content of the speech at all, I wonder whether this approach is appropriate in any case. An SP element with no content is arguably not semantically conformant, since the SP asserts that there is some transcribed "spoken text" here: that's why an entirely empty SP element is invalid.  Maybe you would do better to create your own empty element instead?

<mg:spSize sentences="1" words="24" chars="155"/>

Declare that in your ODD, and add it to an appropriate class (I suggest model.global, or model.stageLike) and you're done


  On 21/06/18 10:34, Mathias Göbel wrote:

Dear list,

given a speech like

Joanna Doe: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.

what is usually encoded like

<tei:sp>
  <tei:speaker>Joanna Doe</tei:speaker>
  <tei:p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.</tei:p>
</tei:sp>

Now i like to omit the text passage within tei:sp and publish a rather abstract layer with amounts of spoken words or sentences.

What would you use to encode this?

Would it be something like

<tei:sp>
  <tei:speaker>Joanna Doe</tei:speaker>
  <tei:gap extent="1" unit="sentences"/>
  <tei:gap extent="24" unit="words"/>
  <tei:gap extent="155" unit="chars"/>
</tei:sp>

Maybe i am wrong, but i don't think multiple tei:gap are useful here, because the reason i like to omit the text refers to a single passage and not multiple passages. But using a single tei:gap is impossible because of the various features i like to preserve. Within the teiHeader there is tei:extent and tei:measure, but they are not available in tei:sp.

Best,
Mathias



--
Mathias Göbel
University of Göttingen
Göttingen State and University Library
D-37070 Göttingen

Research and Development
Papendiek 14 (hist. Building, Room 2.408)
+49 551 39-20184 (Tel., Wednesday to Friday)
+49 551 39-33856 (Fax.)

Digital Library
Software and Service Development
Platz der Göttinger Sieben 1 (Central Library, Room 2.129)
+49 551 39-10230 (Tel., Monday/Tuesday)

[hidden email]
http://www.sub.uni-goettingen.de


Reply | Threaded
Open this post in threaded view
|

Re: abstract layer instead of tei:p

James Cummings-5


Mathias,


Although <sp> needs to have a <p>, <l>, <lg> or similar in it, this does not have to itself contain text. So for each paragraph (or whatever) you redact you could replace it with an empty version:


===

<sp who="#JoannaDoe">
   <speaker>Joanna Doe</speaker>
   <p rend="redacted"/>
   <p rend="redacted"/>
   <p rend="redacted"/>
   <note type="counts" resp="#MG">Contains 
      <measureGrp>
         <measure commodity="words" quantity="150">150 words</measure> 
         in <measure commodity="sentences" quantity="5">five sentences</measure>
         in <measure commodity="paragraphs" quantity="3">three paragraphs</measure>
         using <measure commodity="chars">666 characters</measure>
      </measureGrp>
   </note>
</sp>

===

And here follow it with a note similar to Lou's suggestion (though I've fixed his @resp) and for some reason felt compelled to wrap things in a <measureGrp> referring to this speech, but that is unnecessary. Also note that your earlier xpath was wrong since the @who attribute will be on the <sp> not the <speaker>.  Although some might not like these redacted paragraphs, one of the benefits is that it will give you the ability to have a sense of the structure of the redacted portions (and a count).


Your idea of replacing with Lorem ipsum is also a possibility (just fill the redacted paragraphs automagically with n characters).... You could even retain any phrase-level markup and just replace its text but that seems like an unnecessary complication.


Just another perspective. 

James 


--

Dr James Cummings, [hidden email]

School of English Literature, Language, and Linguistics, Newcastle University


From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Lou Burnard <[hidden email]>
Sent: 21 June 2018 14:03:10
To: [hidden email]
Subject: Re: abstract layer instead of tei:p
 
Yes NOTE would also be possible, though less useful if you are likely to encounter actual notes in your text, of course.

<note type="counts" resp="MG">Contains <measure commodity="words" quantity="24">24 words</measure>
in <measure commodity="sentences" quantity="1">one sentence</measure> using <measure commodity="chars">155 characters</measure>
</note>

Looks a bit silly, but it works!



On 21/06/18 13:19, Mathias Göbel wrote:

I am about to prepare a new corpus for an existing pipeline and because of copyright restrictions i can not publish the full text, but i want to preserve some features of the text. Using a custom element is possible, but at least the pipeline i want to use will query for tei:sp/tei:speaker/@who. As specified by the software, I have to use these elements while struggling with TEI conformance.

When any tei:sp without tei:l or tei:p is semantically invalid, i can not do this according to TEI and i will do as you suggest. What about a tei:note? Within a tei:note a tei:measure is allowed. Unfortunately I will remain with no "primary" content in tei:sp. :-(

There is one rather strange idea left: replacing the text with a dummy that shares the same features.


On 21.06.2018 12:42, Lou Burnard wrote:
Multiple gap elements inside an SP would indeed imply multiple consecutive omissions, which is clearly not what you want here. The same would apply to multiple DEL elements.  But since your markup is not intended to capture the content of the speech at all, I wonder whether this approach is appropriate in any case. An SP element with no content is arguably not semantically conformant, since the SP asserts that there is some transcribed "spoken text" here: that's why an entirely empty SP element is invalid.  Maybe you would do better to create your own empty element instead?

<mg:spSize sentences="1" words="24" chars="155"/>

Declare that in your ODD, and add it to an appropriate class (I suggest model.global, or model.stageLike) and you're done


  On 21/06/18 10:34, Mathias Göbel wrote:

Dear list,

given a speech like

Joanna Doe: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.

what is usually encoded like

<tei:sp>
  <tei:speaker>Joanna Doe</tei:speaker>
  <tei:p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.</tei:p>
</tei:sp>

Now i like to omit the text passage within tei:sp and publish a rather abstract layer with amounts of spoken words or sentences.

What would you use to encode this?

Would it be something like

<tei:sp>
  <tei:speaker>Joanna Doe</tei:speaker>
  <tei:gap extent="1" unit="sentences"/>
  <tei:gap extent="24" unit="words"/>
  <tei:gap extent="155" unit="chars"/>
</tei:sp>

Maybe i am wrong, but i don't think multiple tei:gap are useful here, because the reason i like to omit the text refers to a single passage and not multiple passages. But using a single tei:gap is impossible because of the various features i like to preserve. Within the teiHeader there is tei:extent and tei:measure, but they are not available in tei:sp.

Best,
Mathias



--
Mathias Göbel
University of Göttingen
Göttingen State and University Library
D-37070 Göttingen

Research and Development
Papendiek 14 (hist. Building, Room 2.408)
+49 551 39-20184 (Tel., Wednesday to Friday)
+49 551 39-33856 (Fax.)

Digital Library
Software and Service Development
Platz der Göttinger Sieben 1 (Central Library, Room 2.129)
+49 551 39-10230 (Tel., Monday/Tuesday)

[hidden email]
http://www.sub.uni-goettingen.de