w/@part meaning of initial

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

w/@part meaning of initial

Eduard Drenth
Dear all,

A tei:w can be a @part of a word together with other tei:w. @part can have a value "initial".

What is the meaning of "initial"? First occurrence in a text? First part of the reconstructed word (which may occur as last in text)?

Regards,

Eduard Drenth, Software Architekt

[hidden email]

Doelestrjitte 8
8911 DX  Ljouwert
+31 58 234 30 47
+31 62 094 34 28 (privé)

gpg: https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43
Reply | Threaded
Open this post in threaded view
|

Re: w/@part meaning of initial

Syd Bauman-10
The latter.

The @part attribute is used to indicate that a set of XML elements
should be considered a single "aggregate element" for analytic
purposes (and in which order they should be considered).

   <w part="I">antici</w>
   <w>say</w>
   <w>it</w><pc>!</pc>
   <w part="F">pation</w>


> A tei:w can be a @part of a word together with other tei:w. @part
> can have a value "initial".
>
> What is the meaning of "initial"? First occurrence in a text? First
> part of the reconstructed word (which may occur as last in text)?
Reply | Threaded
Open this post in threaded view
|

Re: w/@part meaning of initial

C. M. Sperberg-McQueen
On Sep 4, 2017, at 7:21 AM, Syd Bauman <[hidden email]> wrote:

>
> The latter.
>
> The @part attribute is used to indicate that a set of XML elements
> should be considered a single "aggregate element" for analytic
> purposes (and in which order they should be considered).
>
>   <w part="I">antici</w>
>   <w>say</w>
>   <w>it</w><pc>!</pc>
>   <w part="F">pation</w>

Perhaps “in which order they should be considered” is a bit strong:
unlike @next and @prev, which point explicitly to the next or
previous part of an aggregate element, @part merely identifies the
part as the initial, medial, or final part of a fragmented element (or as
fragmented in an unspecified way, or as not a fragment at all).

If three different speakers each utter part of a word, in order,
you get something like this:

  <u who=“#hughie”><w part=“I”>Anti</w>-</u>
  <u who=“#louis”>-<w part=“M”>cipa</w>--</u>
  <u who=“#dewey”>-<w part=“F”>tion</w>-</u>
 
If Louis speaks too soon, so the actual order of fragments in the
document becomes “-cipa-“, “Anti-“, “-tion”, @part cannot be used
to reconstruct the word “Anticipation”; one can leave it unspecified
(which makes it harder for a smart indexer to add and entry for
the word “Anticipation” at this location in the text), or one can
use @next and @prev.
 
>
>
>> A tei:w can be a @part of a word together with other tei:w. @part
>> can have a value "initial".
>>
>> What is the meaning of "initial"? First occurrence in a text? First
>> part of the reconstructed word (which may occur as last in text)?

First part of the reconstructed word, but not occurring last in the
text:  the aggregate is assumed to start with an element marked
@part=“I”, continue through zero or more following elements
marked @part=“M”, and end with an element marked @part=“F”.
(At least, that is my understanding of the mechanism; the prose
of chapter 20 and of section 16.3 does not say this explicitly.

 

********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************
Reply | Threaded
Open this post in threaded view
|

Re: w/@part meaning of initial

Eduard Drenth
Thank you, one more question about this remark:

If Louis speaks too soon, so the actual order of fragments in the
document becomes “-cipa-“, “Anti-“, “-tion”, @part cannot be used
to reconstruct the word “Anticipation”;

When encoded like this:

  <u who=“#louis”>-<w part=“M”>cipa</w>--</u>
  <u who=“#hughie”><w part=“I”>Anti</w>-</u>
  <u who=“#dewey”>-<w part=“F”>tion</w>-</u>

“Anticipation” can be reconstructed based on "I", "M" and "F", or am I missing something?

Eduard Drenth, Software Architekt

[hidden email]

Doelestrjitte 8
8911 DX  Ljouwert
+31 58 234 30 47
+31 62 094 34 28 (privé)

gpg: https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43

________________________________________
From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of C. M. Sperberg-McQueen <[hidden email]>
Sent: Monday, September 4, 2017 8:11 PM
To: [hidden email]
Subject: Re: w/@part meaning of initial

On Sep 4, 2017, at 7:21 AM, Syd Bauman <[hidden email]> wrote:

>
> The latter.
>
> The @part attribute is used to indicate that a set of XML elements
> should be considered a single "aggregate element" for analytic
> purposes (and in which order they should be considered).
>
>   <w part="I">antici</w>
>   <w>say</w>
>   <w>it</w><pc>!</pc>
>   <w part="F">pation</w>

Perhaps “in which order they should be considered” is a bit strong:
unlike @next and @prev, which point explicitly to the next or
previous part of an aggregate element, @part merely identifies the
part as the initial, medial, or final part of a fragmented element (or as
fragmented in an unspecified way, or as not a fragment at all).

If three different speakers each utter part of a word, in order,
you get something like this:

  <u who=“#hughie”><w part=“I”>Anti</w>-</u>
  <u who=“#louis”>-<w part=“M”>cipa</w>--</u>
  <u who=“#dewey”>-<w part=“F”>tion</w>-</u>

If Louis speaks too soon, so the actual order of fragments in the
document becomes “-cipa-“, “Anti-“, “-tion”, @part cannot be used
to reconstruct the word “Anticipation”; one can leave it unspecified
(which makes it harder for a smart indexer to add and entry for
the word “Anticipation” at this location in the text), or one can
use @next and @prev.

>
>
>> A tei:w can be a @part of a word together with other tei:w. @part
>> can have a value "initial".
>>
>> What is the meaning of "initial"? First occurrence in a text? First
>> part of the reconstructed word (which may occur as last in text)?

First part of the reconstructed word, but not occurring last in the
text:  the aggregate is assumed to start with an element marked
@part=“I”, continue through zero or more following elements
marked @part=“M”, and end with an element marked @part=“F”.
(At least, that is my understanding of the mechanism; the prose
of chapter 20 and of section 16.3 does not say this explicitly.



********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************
Reply | Threaded
Open this post in threaded view
|

Re: w/@part meaning of initial

C. M. Sperberg-McQueen
> On Sep 4, 2017, at 3:13 PM, Eduard Drenth <[hidden email]> wrote:
>
> Thank you, one more question about this remark:
>
> If Louis speaks too soon, so the actual order of fragments in the
> document becomes “-cipa-“, “Anti-“, “-tion”, @part cannot be used
> to reconstruct the word “Anticipation”;
>
> When encoded like this:
>
>  <u who=“#louis”>-<w part=“M”>cipa</w>--</u>
>  <u who=“#hughie”><w part=“I”>Anti</w>-</u>
>  <u who=“#dewey”>-<w part=“F”>tion</w>-</u>
>
> “Anticipation” can be reconstructed based on "I", "M" and "F", or am I missing something?

If these are the only occurrences of the ‘part’ attribute in the
document, yes.  In general, one cannot assume that to be the case.  If
the example just quoted is preceded or followed or both by further
fragmented words spoken out of order, it would be dangerous to assume
that this particular group of fragments all belongs together.

At the risk of flogging a dead horse, consider the following example.
Given a document containing the following sequence of w elements (I
omit their contexts), what words would one reconstruct?  Which initial
parts belong with which final parts?  There are more part=“F” elements
here than part=“I” — does that mean we are looking at a fragment, and
must look further earlier or later in the document for the missing
initial parts?  Or does it indicate that something has gone grievously
wrong with the document?

<w part="F">scoo</w>
<w part="F">doo</w>
<w part="N">bi</w>
<w part="F">do</w>
<w part="N">wop</w>
<w part="F">be</w>
<w part="I">bop</w>
<w part="Y">pop</w>
<w part="N">fa</w>
<w part="F">la</w>
<w part="N">la</w>
<w part="Y">la</w>
<w part="Y">be</w>
<w part="M">bee</w>
<w part="I">bay</w>

As far as I can tell, the logical units (the virtual elements, if you
will) and the sequence of fragments within each logical unit can be
successfully reconstructed from repeated instances of fragmentation
encoded using the ‘part’ attribute if (but as far as I can tell only
if):

  (a) there are no nested fragmentations, and
 
  (b) instances of the part attribute form a sequence of values
  matching the regular expression (Y | N | (I M* F))*.

If a project I were involved in were using the part attribute I would
be inclined to enforce these two properties by means of
project-specific validation (e.g. via Schematron rules).  And the
'part' attribute and its values don't seem to me to make design sense
if these properties are not guaranteed.  But the text of P5 does not
formulate either condition explicitly as a rule.  P5 mentions
condition (a) only in the form of advice; if the constraint is
violated, users of the doument are likely to be confused.  And
condition (b) can be read out of the text of P5 only if we take the
word "fragmented" to mean "broken into smaller pieces, BUT NOT
REORDERED" in the description of 'part':

    part     specifies whether or not its parent element is fragmented in
               some way, typically by some other overlapping
               structure: for example a speech which is divided
               between two or more verse stanzas, a paragraph which is
               split across a page division, a verse line which is
               divided between two speakers.

I think that's not an unreasonable reading of the word 'fragmented',
but perhaps not all readers of P5 will agree.

With our language-lawyer hats on, we can observe that the case of
material whose logical order is not document order is explicitly
described (and an example given) in the discussion of 'next' and
'prev'.  Does the absence of that case from the discussion of 'part'
suggest (1) that 'part' is not intended to cover that case, or (2)
that it was thought to be obvious how to apply 'part' in such cases?
As a reader, I lean towards (1), if only because I don’t see how to
apply ‘part’ in cases where document order and logical order are
at odds and the document contains more than one logical element
which has been broken into fragments.

I hope this helps explain what I meant. Your reading of P5 may
of course differ from mine.


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************
Reply | Threaded
Open this post in threaded view
|

Re: w/@part meaning of initial

Eduard Drenth
Right, hence the @next and @prev, thanks!

Eduard Drenth, Software Architekt

[hidden email]

Doelestrjitte 8
8911 DX  Ljouwert
+31 58 234 30 47
+31 62 094 34 28 (privé)

gpg: https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43

________________________________________
From: C. M. Sperberg-McQueen <[hidden email]>
Sent: Tuesday, September 5, 2017 1:48 AM
To: Eduard Drenth
Cc: C. M. Sperberg-McQueen; [hidden email]
Subject: Re: w/@part meaning of initial

> On Sep 4, 2017, at 3:13 PM, Eduard Drenth <[hidden email]> wrote:
>
> Thank you, one more question about this remark:
>
> If Louis speaks too soon, so the actual order of fragments in the
> document becomes “-cipa-“, “Anti-“, “-tion”, @part cannot be used
> to reconstruct the word “Anticipation”;
>
> When encoded like this:
>
>  <u who=“#louis”>-<w part=“M”>cipa</w>--</u>
>  <u who=“#hughie”><w part=“I”>Anti</w>-</u>
>  <u who=“#dewey”>-<w part=“F”>tion</w>-</u>
>
> “Anticipation” can be reconstructed based on "I", "M" and "F", or am I missing something?

If these are the only occurrences of the ‘part’ attribute in the
document, yes.  In general, one cannot assume that to be the case.  If
the example just quoted is preceded or followed or both by further
fragmented words spoken out of order, it would be dangerous to assume
that this particular group of fragments all belongs together.

At the risk of flogging a dead horse, consider the following example.
Given a document containing the following sequence of w elements (I
omit their contexts), what words would one reconstruct?  Which initial
parts belong with which final parts?  There are more part=“F” elements
here than part=“I” — does that mean we are looking at a fragment, and
must look further earlier or later in the document for the missing
initial parts?  Or does it indicate that something has gone grievously
wrong with the document?

<w part="F">scoo</w>
<w part="F">doo</w>
<w part="N">bi</w>
<w part="F">do</w>
<w part="N">wop</w>
<w part="F">be</w>
<w part="I">bop</w>
<w part="Y">pop</w>
<w part="N">fa</w>
<w part="F">la</w>
<w part="N">la</w>
<w part="Y">la</w>
<w part="Y">be</w>
<w part="M">bee</w>
<w part="I">bay</w>

As far as I can tell, the logical units (the virtual elements, if you
will) and the sequence of fragments within each logical unit can be
successfully reconstructed from repeated instances of fragmentation
encoded using the ‘part’ attribute if (but as far as I can tell only
if):

  (a) there are no nested fragmentations, and

  (b) instances of the part attribute form a sequence of values
  matching the regular expression (Y | N | (I M* F))*.

If a project I were involved in were using the part attribute I would
be inclined to enforce these two properties by means of
project-specific validation (e.g. via Schematron rules).  And the
'part' attribute and its values don't seem to me to make design sense
if these properties are not guaranteed.  But the text of P5 does not
formulate either condition explicitly as a rule.  P5 mentions
condition (a) only in the form of advice; if the constraint is
violated, users of the doument are likely to be confused.  And
condition (b) can be read out of the text of P5 only if we take the
word "fragmented" to mean "broken into smaller pieces, BUT NOT
REORDERED" in the description of 'part':

    part     specifies whether or not its parent element is fragmented in
               some way, typically by some other overlapping
               structure: for example a speech which is divided
               between two or more verse stanzas, a paragraph which is
               split across a page division, a verse line which is
               divided between two speakers.

I think that's not an unreasonable reading of the word 'fragmented',
but perhaps not all readers of P5 will agree.

With our language-lawyer hats on, we can observe that the case of
material whose logical order is not document order is explicitly
described (and an example given) in the discussion of 'next' and
'prev'.  Does the absence of that case from the discussion of 'part'
suggest (1) that 'part' is not intended to cover that case, or (2)
that it was thought to be obvious how to apply 'part' in such cases?
As a reader, I lean towards (1), if only because I don’t see how to
apply ‘part’ in cases where document order and logical order are
at odds and the document contains more than one logical element
which has been broken into fragments.

I hope this helps explain what I meant. Your reading of P5 may
of course differ from mine.


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************
Reply | Threaded
Open this post in threaded view
|

Re: w/@part meaning of initial

Lou Burnard-6
In reply to this post by C. M. Sperberg-McQueen
It might perhaps be appropriate to propose a clarification in the text
to the effect that @part is NOT intended to do the same as  @next/@prev
-- i.e. that it assumes the document order corresponds with the logical
order within which the @part attribute asserts that the fragment
indicated is "initial" or "final" or whatever. Aside from common sense,
a good reason for thinking so is that you could conceivable combine
these attributes -- by saying that the "initial" fragment is wherever
@next is pointing to for example.(Though it makes my head hurt to
imagine this used in practice). Certainly it's worth stating that the
primary use case for @part is where the document structure causes
fragmentation (e.g. verse lines broken by speech boundaries), not where
components of a transcription are to be reassembled in a non-document order.




  On 05/09/17 00:48, C. M. Sperberg-McQueen wrote:

>> On Sep 4, 2017, at 3:13 PM, Eduard Drenth <[hidden email]> wrote:
>>
>> Thank you, one more question about this remark:
>>
>> If Louis speaks too soon, so the actual order of fragments in the
>> document becomes “-cipa-“, “Anti-“, “-tion”, @part cannot be used
>> to reconstruct the word “Anticipation”;
>>
>> When encoded like this:
>>
>>   <u who=“#louis”>-<w part=“M”>cipa</w>--</u>
>>   <u who=“#hughie”><w part=“I”>Anti</w>-</u>
>>   <u who=“#dewey”>-<w part=“F”>tion</w>-</u>
>>
>> “Anticipation” can be reconstructed based on "I", "M" and "F", or am I missing something?
> If these are the only occurrences of the ‘part’ attribute in the
> document, yes.  In general, one cannot assume that to be the case.  If
> the example just quoted is preceded or followed or both by further
> fragmented words spoken out of order, it would be dangerous to assume
> that this particular group of fragments all belongs together.
>
> At the risk of flogging a dead horse, consider the following example.
> Given a document containing the following sequence of w elements (I
> omit their contexts), what words would one reconstruct?  Which initial
> parts belong with which final parts?  There are more part=“F” elements
> here than part=“I” — does that mean we are looking at a fragment, and
> must look further earlier or later in the document for the missing
> initial parts?  Or does it indicate that something has gone grievously
> wrong with the document?
>
> <w part="F">scoo</w>
> <w part="F">doo</w>
> <w part="N">bi</w>
> <w part="F">do</w>
> <w part="N">wop</w>
> <w part="F">be</w>
> <w part="I">bop</w>
> <w part="Y">pop</w>
> <w part="N">fa</w>
> <w part="F">la</w>
> <w part="N">la</w>
> <w part="Y">la</w>
> <w part="Y">be</w>
> <w part="M">bee</w>
> <w part="I">bay</w>
>
> As far as I can tell, the logical units (the virtual elements, if you
> will) and the sequence of fragments within each logical unit can be
> successfully reconstructed from repeated instances of fragmentation
> encoded using the ‘part’ attribute if (but as far as I can tell only
> if):
>
>    (a) there are no nested fragmentations, and
>    
>    (b) instances of the part attribute form a sequence of values
>    matching the regular expression (Y | N | (I M* F))*.
>
> If a project I were involved in were using the part attribute I would
> be inclined to enforce these two properties by means of
> project-specific validation (e.g. via Schematron rules).  And the
> 'part' attribute and its values don't seem to me to make design sense
> if these properties are not guaranteed.  But the text of P5 does not
> formulate either condition explicitly as a rule.  P5 mentions
> condition (a) only in the form of advice; if the constraint is
> violated, users of the doument are likely to be confused.  And
> condition (b) can be read out of the text of P5 only if we take the
> word "fragmented" to mean "broken into smaller pieces, BUT NOT
> REORDERED" in the description of 'part':
>
>      part     specifies whether or not its parent element is fragmented in
>                 some way, typically by some other overlapping
>                 structure: for example a speech which is divided
>                 between two or more verse stanzas, a paragraph which is
>                 split across a page division, a verse line which is
>                 divided between two speakers.
>
> I think that's not an unreasonable reading of the word 'fragmented',
> but perhaps not all readers of P5 will agree.
>
> With our language-lawyer hats on, we can observe that the case of
> material whose logical order is not document order is explicitly
> described (and an example given) in the discussion of 'next' and
> 'prev'.  Does the absence of that case from the discussion of 'part'
> suggest (1) that 'part' is not intended to cover that case, or (2)
> that it was thought to be obvious how to apply 'part' in such cases?
> As a reader, I lean towards (1), if only because I don’t see how to
> apply ‘part’ in cases where document order and logical order are
> at odds and the document contains more than one logical element
> which has been broken into fragments.
>
> I hope this helps explain what I meant. Your reading of P5 may
> of course differ from mine.
>
>
> ********************************************
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> [hidden email]
> http://www.blackmesatech.com
> ********************************************