Fwd: trojan horse

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: trojan horse

Lou Burnard-6

I'll comment on Michael's actual questions in another post, but first I'd like to get this of my chest... why oh why oh why oh why is it felt to be so important that the element name used to mark the start or end a stretch of horse-delimited markup should have to be the same as the name of a regular content-containing element? Why abuse an existing element, introduce ambiguity (sometimes a horse-end might actually be a real empty non-horse after all), and complexify processing?

Given that this markup is going to have to be specially processed anyway, why not make the name of the thing that the span represents an attribute as well? or use a different gi for it?
Concretely, why is

<q sID="foo"/> .... <q eID="foo"/>

considered superior to the entirely TEI-conformant

<milestone type="horseEnd" unit="q_start" n="foo"/>

<milestone type="horseEnd" unit="q_end" n="foo"/>

given that the former requires a (semantically dubious) modification to  special-case (some but not all) empty <q> elements? True it's more verbose, but that's arguably why it's better: no magic here.

Or if you don't like combining two pieces of information in a single attribute (@unit), why not make an conformant extension such as

<horse:tag type="start" result="q" id="foo"/>

<horse:tag type="end" result="q" id="foo"/>

Finding all the real or potential q elements in a single sweep through  the document is going to be a bit more complex whatever solution you adopt.



Reply | Threaded
Open this post in threaded view
|

Re: Fwd: trojan horse

Peter Flynn-8

On 26 October 2017 11:06:30 Lou Burnard <[hidden email]> wrote:

> I'll comment on Michael's actual questions in another post, but first
> I'd like to get this of my chest... why oh why oh why oh why is it felt
> to be so important that the element name used to mark the start or end a
> stretch of horse-delimited markup should have to be the same as the name
> of a regular content-containing element?

As far as I can make out, because it means what it says: it's a quote. It's marked differently from the normal content-containing q but it's still a quote. Inventing a new name would be perverse, and IMNSHO would constitute pollution of the namespace.

Flynn's Rule (which I just invented :-) says that variant applications of an element type should be marked in attributes, not by adding to the pool of element type names.

>

Why abuse an existing element,

If the span denoted by the proposed usage is a quote, this is not abuse.


> introduce ambiguity (sometimes a horse-end might actually be a real
> empty non-horse after all),

I thought that was what the rules in schemas and Schematron were for: enforcing specific uses; eg if q has an @eID, and the matching @xml:id is on a preceding q, then content in both is disallowed.

>

and complexify processing?


The corollary of Flynn's Rule must surely be that additional processing is unavoidable — as it would be no matter which way you do it.


> Concretely, why is
>
> <q sID="foo"/> .... <q eID="foo"/>


@sID is probably unnecessary: @xml:id should be adequate.


> considered superior to the entirely TEI-conformant
>
> <milestone type="horseEnd" unit="q_start" n="foo"/>
>
> <milestone type="horseEnd" unit="q_end" n="foo"/>


a) it's shorter (pace XML spec)

b) if you want me to be able to retrieve or process a span of this type (in either format) in a sensible manner  I require an ID and a matching IDREF, otherwise all bets are off.


> given that the former requires a (semantically dubious) modification to  special-case (some but not all) empty <q> elements?


>

True it's more verbose, but that's arguably why it's better: no magic here.

The latter is a matter of taste; I prefer explicit brevity. But I would have thought that after 30 years we would have grokked the utility of start—end markers for overlap and settled on a single attribute to hold the IDREF of the terminator (given that xml:id is already available on everything).


> Or if you don't like combining two pieces of information in a single attribute (@unit), why not make an conformant extension such as
>
> <horse:tag type="start" result="q" id="foo"/>
>
> <horse:tag type="end" result="q" id="foo"/>


Adding Yet Another Namespace simply makes a horse's rear end of the whole thing.


> Finding all the real or potential q elements in a single sweep through  the document is going to be a bit more complex whatever solution you adopt.


I suspect that XPath2/3 is well up to the task.

As ever, your kilometrage may vary...

///Peter

Written on a phone on board ship, so please excuse typos.

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: trojan horse

Lou Burnard-6
Hi Peter

I'm on a eurostar rather than a boat, so I'll be brief.

I think you're missing a rather crucial point of Michael's proposal: the linking between the two ends of the horse is effected by co-reference, not by ID/IDREF, nor by URI. 

The reason I say it's a bad idea to special-case <q/> (with appropriate attributes) is that it reduces my ability to introduce a genuinely empty <q> in a document without ambiguity.  But this is not an opinion every shares (see further earlier discussion on this list sv "tagging verse and quotes" or some such)

Your objection to using a different namespace is quaint, but I think you'll find that train has already left the station. A different namespace is precisely what you need in order to make clear that your usage is not a TEI-specified one.  Why needlessly obscure the primary goal of all this effort, which is to make your coding comprehensible to someone else. If I find a <tag> in a document which is in the TEI namespace, I expect that element to respect the semantics and syntax defined by the TEI. If it doesn't it should be in some other namespace. 

Lou


From: TEI (Text Encoding Initiative) public discussion list [[hidden email]] on behalf of Peter Flynn [[hidden email]]
Sent: 26 October 2017 12:30
To: [hidden email]
Subject: Re: Fwd: trojan horse

On 26 October 2017 11:06:30 Lou Burnard <[hidden email]> wrote:

> I'll comment on Michael's actual questions in another post, but first
> I'd like to get this of my chest... why oh why oh why oh why is it felt
> to be so important that the element name used to mark the start or end a
> stretch of horse-delimited markup should have to be the same as the name
> of a regular content-containing element?

As far as I can make out, because it means what it says: it's a quote. It's marked differently from the normal content-containing q but it's still a quote. Inventing a new name would be perverse, and IMNSHO would constitute pollution of the namespace.

Flynn's Rule (which I just invented :-) says that variant applications of an element type should be marked in attributes, not by adding to the pool of element type names.

>

Why abuse an existing element,

If the span denoted by the proposed usage is a quote, this is not abuse.


> introduce ambiguity (sometimes a horse-end might actually be a real
> empty non-horse after all),

I thought that was what the rules in schemas and Schematron were for: enforcing specific uses; eg if q has an @eID, and the matching @xml:id is on a preceding q, then content in both is disallowed.

>

and complexify processing?


The corollary of Flynn's Rule must surely be that additional processing is unavoidable — as it would be no matter which way you do it.


> Concretely, why is
>
> <q sID="foo"/> .... <q eID="foo"/>


@sID is probably unnecessary: @xml:id should be adequate.


> considered superior to the entirely TEI-conformant
>
> <milestone type="horseEnd" unit="q_start" n="foo"/>
>
> <milestone type="horseEnd" unit="q_end" n="foo"/>


a) it's shorter (pace XML spec)

b) if you want me to be able to retrieve or process a span of this type (in either format) in a sensible manner  I require an ID and a matching IDREF, otherwise all bets are off.


> given that the former requires a (semantically dubious) modification to  special-case (some but not all) empty <q> elements?


>

True it's more verbose, but that's arguably why it's better: no magic here.

The latter is a matter of taste; I prefer explicit brevity. But I would have thought that after 30 years we would have grokked the utility of start—end markers for overlap and settled on a single attribute to hold the IDREF of the terminator (given that xml:id is already available on everything).


> Or if you don't like combining two pieces of information in a single attribute (@unit), why not make an conformant extension such as
>
> <horse:tag type="start" result="q" id="foo"/>
>
> <horse:tag type="end" result="q" id="foo"/>


Adding Yet Another Namespace simply makes a horse's rear end of the whole thing.


> Finding all the real or potential q elements in a single sweep through  the document is going to be a bit more complex whatever solution you adopt.


I suspect that XPath2/3 is well up to the task.

As ever, your kilometrage may vary...

///Peter

Written on a phone on board ship, so please excuse typos.

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: trojan horse

Piotr Bański
Just a tidbit.
Lou, careful as always, says:

 > If I find a <tag> in a document which is in the TEI
 > namespace, I expect that element to respect the semantics and syntax
 > defined by the TEI

The crucial fragment is, naturally, "and syntax". In the case of
overlap, the syntax violation is arguably a result of the clash between
the Platonic TEI abstract model and the limitations of the current TEI
serialization format, namely TEI XML.

HORSE, or its evolved or degenerate variants, attempt to maintain the
TEI-defined semantics at all costs, with mechanisms that cut across the
XML syntactic shackles. XML syntax is in this case not a virtue to be
cherished but at best a compromise to be suffered. Much as I am a fan of
tag abuse (in the positive sense...)[1], I can't even see this as tag
abuse. Rather as a rescue mechanism.

In essence, this feels not like fight for the truth but rather as
confusion as to the world in which the truth conditions are to be
checked. I'm not saying I know the solution, I would just like to turn
attention to the plane at which the "conflict" appears to take place.

Best,

   Piotr


[1]: ...mostly...


On 10/26/17 15:30, Lou Burnard wrote:

> Hi Peter
>
> I'm on a eurostar rather than a boat, so I'll be brief.
>
> I think you're missing a rather crucial point of Michael's proposal: the
> linking between the two ends of the horse is effected by co-reference,
> not by ID/IDREF, nor by URI.
>
> The reason I say it's a bad idea to special-case <q/> (with appropriate
> attributes) is that it reduces my ability to introduce a genuinely empty
> <q> in a document without ambiguity.  But this is not an opinion every
> shares (see further earlier discussion on this list sv "tagging verse
> and quotes" or some such)
>
> Your objection to using a different namespace is quaint, but I think
> you'll find that train has already left the station. A different
> namespace is precisely what you need in order to make clear that your
> usage is not a TEI-specified one.  Why needlessly obscure the primary
> goal of all this effort, which is to make your coding comprehensible to
> someone else. If I find a <tag> in a document which is in the TEI
> namespace, I expect that element to respect the semantics and syntax
> defined by the TEI. If it doesn't it should be in some other namespace.
>
> Lou
>
> ------------------------------------------------------------------------
> *From:* TEI (Text Encoding Initiative) public discussion list
> [[hidden email]] on behalf of Peter Flynn [[hidden email]]
> *Sent:* 26 October 2017 12:30
> *To:* [hidden email]
> *Subject:* Re: Fwd: trojan horse
>
> On 26 October 2017 11:06:30 Lou Burnard <[hidden email]>
> wrote:
>
>  > I'll comment on Michael's actual questions in another post, but first
>  > I'd like to get this of my chest... why oh why oh why oh why is it felt
>  > to be so important that the element name used to mark the start or end a
>  > stretch of horse-delimited markup should have to be the same as the name
>  > of a regular content-containing element?
>
> As far as I can make out, because it means what it says: it's a quote.
> It's marked differently from the normal content-containing q but it's
> still a quote. Inventing a new name would be perverse, and IMNSHO would
> constitute pollution of the namespace.
>
> Flynn's Rule (which I just invented :-) says that variant applications
> of an element type should be marked in attributes, not by adding to the
> pool of element type names.
>
>  >
>
> Why abuse an existing element,
>
> If the span denoted by the proposed usage is a quote, this is not abuse.
>
>
>  > introduce ambiguity (sometimes a horse-end might actually be a real
>  > empty non-horse after all),
>
> I thought that was what the rules in schemas and Schematron were for:
> enforcing specific uses; eg if q has an @eID, and the matching @xml:id
> is on a preceding q, then content in both is disallowed.
>
>  >
>
> and complexify processing?
>
>
> The corollary of Flynn's Rule must surely be that additional processing
> is unavoidable — as it would be no matter which way you do it.
>
>
>  > Concretely, why is
>  >
>  > <q sID="foo"/> .... <q eID="foo"/>
>
>
> @sID is probably unnecessary: @xml:id should be adequate.
>
>
>  > considered superior to the entirely TEI-conformant
>  >
>  > <milestone type="horseEnd" unit="q_start" n="foo"/>
>  >
>  > <milestone type="horseEnd" unit="q_end" n="foo"/>
>
>
> a) it's shorter (pace XML spec)
>
> b) if you want me to be able to retrieve or process a span of this type
> (in either format) in a sensible manner  I require an ID and a matching
> IDREF, otherwise all bets are off.
>
>
>  > given that the former requires a (semantically dubious) modification
> to  special-case (some but not all) empty <q> elements?
>
>
>  >
>
> True it's more verbose, but that's arguably why it's better: no magic here.
>
> The latter is a matter of taste; I prefer explicit brevity. But I would
> have thought that after 30 years we would have grokked the utility of
> start—end markers for overlap and settled on a single attribute to hold
> the IDREF of the terminator (given that xml:id is already available on
> everything).
>
>
>  > Or if you don't like combining two pieces of information in a single
> attribute (@unit), why not make an conformant extension such as
>  >
>  > <horse:tag type="start" result="q" id="foo"/>
>  >
>  > <horse:tag type="end" result="q" id="foo"/>
>
>
> Adding Yet Another Namespace simply makes a horse's rear end of the
> whole thing.
>
>
>  > Finding all the real or potential q elements in a single sweep
> through  the document is going to be a bit more complex whatever
> solution you adopt.
>
>
> I suspect that XPath2/3 is well up to the task.
>
> As ever, your kilometrage may vary...
>
> ///Peter
>
> Written on a phone on board ship, so please excuse typos.
>
Reply | Threaded
Open this post in threaded view
|

Re: trojan horse

C. M. Sperberg-McQueen
In reply to this post by Lou Burnard-6
On Oct 26, 2017, at 4:05 AM, Lou Burnard <[hidden email]> wrote:


> I'll comment on Michael's actual questions in another post, but
> first I'd like to get this of my chest... why oh why oh why oh why
> is it felt to be so important that the element name used to mark the
> start or end a stretch of horse-delimited markup should have to be
> the same as the name of a regular content-containing element?

Excellent question!  I am sorry that it seems to be such a painful one
for you, but it's a good question regardless.  And I look forward to
your comments on the questions I asked.

I think the primary rationale explicitly offered by DeRose 2004 is
given in the following paragraphs of his paper:

>> The advantages of Trojan milestones include those of milestones in
>> general, with improved readability. They are also easy to teach
>> (the milestoned form is, to the user, just a slight syntax
>> variation on the normal form). A hierarchical subset of the markup
>> can also be designed in, simply by choosing some elements that may
>> not be "milestoned" (that is, which do not permit empty content,
>> and/or do not permit the sID/eID attributes). For example, OSIS
>> defines that the verse hierarchy is always secondary to the
>> linguistic/rhetorical hierarchy, and so many tags in the latter are
>> not milestonable. If counterexamples arise, the schema change to
>> add them is entirely backward-compatible.

>> The advantage that (unlike generic milestones) Trojan milestones
>> look like element tags (that is, they have the same GI) should not
>> be underestimated; while unlike extra milestone elements with
>> derived GIs, they do not expand the list of element types.

> Why abuse an existing element, introduce ambiguity (sometimes a
> horse-end might actually be a real empty non-horse after all), and
> complexify processing?

There is no ambiguity in the Trojan Horse proposal.  A q element with
either @sID or @eID is not a conventional q element; a q element with
either of those attributes is not a Trojan Horse marker.

In the deployment scenario described by DeRose 2004 in which overlap
applies to a relatively small number of element types and/or element
instances, compared to the entire document (so that CONCUR might feel
a bit heavyweight, since the majority of element instances will be the
same in all concurrent hierarchical structures), there is also no
reason to use Trojan Horse milestones for empty elements:  no empty
element will ever be inexpressible without special means.

In the concurrent-structures use case I'm preparing for, this is not
necessarily true, so my draft transformation for translating from
conventional (aka 'nesting', or 'deep' format) XML into Trojan Horse
form uses a third attribute (@soleID) to signal that a given element
(which must be empty to be valid) is a Trojan-Horse representation of
an empty element in the secondary document hierarchy.  (It's not clear
whether this form will feel necessary or useful to anyone who is not
thinking as I am of CONCUR as the model being realized at least for
part of the document.  I would like a wholly automatic lossless
conversion between two different document structures, and for that
purpose it's helpful to distinguish conventional and Trojan-Horse
empty elements.  YMMV, of course.)

> Given that this markup is going to have to be specially processed
> anyway, why not make the name of the thing that the span represents
> an attribute as well? or use a different gi for it?  Concretely, why
> is

>     <q sID="foo"/> .... <q eID="foo"/>

> considered superior to the entirely TEI-conformant

>     <milestone type="horseEnd" unit="q_start" n="foo"/>

>     <milestone type="horseEnd" unit="q_end" n="foo"/>

> given that the former requires a (semantically dubious) modification
> to special-case (some but not all) empty <q> elements? True it's
> more verbose, but that's arguably why it's better: no magic here.

The passage from DeRose 2004 quoted above explains his reasons.

I think he's right on the human readability issue, although (a)
readability is subjective and (b) readability is tighly bound up with
familiarity, so one could perhaps change one's mind given enough work
with 'milestone' elements.

The TEI 'milestone' element, however, has the drawback that it does
not seem to provide a reliable way to represent the attributes of the
virtual element.  If instead of the example shown we write

    <q sID="foo" who="#YHWH"/>
        ....
      <q sID="bar" who="#Jermiah'/> .... <q eID="bar"/>
    <q eID="foo"/>

we seem not to have an equivalent representation using tei:milestone.

My draft transform would represent this as something like:

    <th:start sID="foo" gi="q">
        <th:avs name="who" value="#YHWH"/>
    </th:start>
    ....
        <th:start sID="bar" gi="q">
            <th:avs name="who" value="#Jeremiah"/>
        </th:start>
        ....
        <th:end eID="bar" gi="q"/>
    <th:end eID="foo" gi="q"/>

I infer from your note that you might find this more attractive, which
feels like a confirmation of my decision to include it as an option in
the design.

On my reading of the documentation for 'milestone' element, however,
your example using 'milestone' for this seems to come uncomfortably
close to tag abuse or something very like it.  There is no
pre-existing canonical reference system involved here, and no
plausible sense in which there is some thing (call it X) which has
some value before the direct discourse, changes its value during the
direct discourse, and changes to another value (a third value? or back
to the first?) at the end of the direct discourse.  It is of course
possible to model containment in this way, as anyone who has tried to
shoe-horn XML functionality into COCOA markup knows.  But the notion
that there is some X changing values at start- and end-tags seems to
me to be an artefact of the COCOA-style mechanism.  COCOA markup, and
'milestone', work well for values that tesselate (a substantial region
of) a document; they seem awkward to me (well, "bogus" would be a more
accurate description) for other things that start and end.  That, at
any rate, is one reason I would be reluctant to use tei:milestone as
currently defined for Trojan Horse markup.  The attributes just clinch
it.


> Or if you don't like combining two pieces of information in a single
> attribute (@unit), why not make an conformant extension such as

>     <horse:tag type="start" result="q" id="foo"/>

>     <horse:tag type="end" result="q" id="foo"/>

Modulo the treatment of attributes on the virtual element (and the
choice of attribute and element names), I think this is very similar
to the th:start and th:end elements shown above.

> Finding all the real or potential q elements in a single sweep
> through the document is going to be a bit more complex whatever
> solution you adopt.

Yes, it is.  If we had plausible measures of complexity either for
markup or for XPath / XSLT / XQuery expressions, we might be able to
quantify the relative complexity of different solutions; for these
variants on milestone-style markup I would not expect dramatic
differences in complexity.  So I haven't seen any way in which
computational complexity can be used to argue for or against the
various milestone-like approaches to overlap syntax.

best,

Michael



********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: trojan horse

Martin Holmes
In reply to this post by Piotr Bański
Perhaps we should define a standard TEI namespace for standard TEI
elements being used in this abusive(?) manner, just as we have one for
TEI elements being used as exemplars. Then it would be easy to
distinguish conventional usage from HORSE usage, but the element local
names would remain the same.

Cheers,
Martin

On 2017-10-26 07:13 AM, Piotr Bański wrote:

> Just a tidbit.
> Lou, careful as always, says:
>
>  > If I find a <tag> in a document which is in the TEI
>  > namespace, I expect that element to respect the semantics and syntax
>  > defined by the TEI
>
> The crucial fragment is, naturally, "and syntax". In the case of
> overlap, the syntax violation is arguably a result of the clash between
> the Platonic TEI abstract model and the limitations of the current TEI
> serialization format, namely TEI XML.
>
> HORSE, or its evolved or degenerate variants, attempt to maintain the
> TEI-defined semantics at all costs, with mechanisms that cut across the
> XML syntactic shackles. XML syntax is in this case not a virtue to be
> cherished but at best a compromise to be suffered. Much as I am a fan of
> tag abuse (in the positive sense...)[1], I can't even see this as tag
> abuse. Rather as a rescue mechanism.
>
> In essence, this feels not like fight for the truth but rather as
> confusion as to the world in which the truth conditions are to be
> checked. I'm not saying I know the solution, I would just like to turn
> attention to the plane at which the "conflict" appears to take place.
>
> Best,
>
>    Piotr
>
>
> [1]: ...mostly...
>
>
> On 10/26/17 15:30, Lou Burnard wrote:
>> Hi Peter
>>
>> I'm on a eurostar rather than a boat, so I'll be brief.
>>
>> I think you're missing a rather crucial point of Michael's proposal:
>> the linking between the two ends of the horse is effected by
>> co-reference, not by ID/IDREF, nor by URI.
>>
>> The reason I say it's a bad idea to special-case <q/> (with
>> appropriate attributes) is that it reduces my ability to introduce a
>> genuinely empty <q> in a document without ambiguity.  But this is not
>> an opinion every shares (see further earlier discussion on this list
>> sv "tagging verse and quotes" or some such)
>>
>> Your objection to using a different namespace is quaint, but I think
>> you'll find that train has already left the station. A different
>> namespace is precisely what you need in order to make clear that your
>> usage is not a TEI-specified one.  Why needlessly obscure the primary
>> goal of all this effort, which is to make your coding comprehensible
>> to someone else. If I find a <tag> in a document which is in the TEI
>> namespace, I expect that element to respect the semantics and syntax
>> defined by the TEI. If it doesn't it should be in some other namespace.
>>
>> Lou
>>
>> ------------------------------------------------------------------------
>> *From:* TEI (Text Encoding Initiative) public discussion list
>> [[hidden email]] on behalf of Peter Flynn
>> [[hidden email]]
>> *Sent:* 26 October 2017 12:30
>> *To:* [hidden email]
>> *Subject:* Re: Fwd: trojan horse
>>
>> On 26 October 2017 11:06:30 Lou Burnard <[hidden email]>
>> wrote:
>>
>>  > I'll comment on Michael's actual questions in another post, but first
>>  > I'd like to get this of my chest... why oh why oh why oh why is it
>> felt
>>  > to be so important that the element name used to mark the start or
>> end a
>>  > stretch of horse-delimited markup should have to be the same as the
>> name
>>  > of a regular content-containing element?
>>
>> As far as I can make out, because it means what it says: it's a quote.
>> It's marked differently from the normal content-containing q but it's
>> still a quote. Inventing a new name would be perverse, and IMNSHO
>> would constitute pollution of the namespace.
>>
>> Flynn's Rule (which I just invented :-) says that variant applications
>> of an element type should be marked in attributes, not by adding to
>> the pool of element type names.
>>
>>  >
>>
>> Why abuse an existing element,
>>
>> If the span denoted by the proposed usage is a quote, this is not abuse.
>>
>>
>>  > introduce ambiguity (sometimes a horse-end might actually be a real
>>  > empty non-horse after all),
>>
>> I thought that was what the rules in schemas and Schematron were for:
>> enforcing specific uses; eg if q has an @eID, and the matching @xml:id
>> is on a preceding q, then content in both is disallowed.
>>
>>  >
>>
>> and complexify processing?
>>
>>
>> The corollary of Flynn's Rule must surely be that additional
>> processing is unavoidable — as it would be no matter which way you do it.
>>
>>
>>  > Concretely, why is
>>  >
>>  > <q sID="foo"/> .... <q eID="foo"/>
>>
>>
>> @sID is probably unnecessary: @xml:id should be adequate.
>>
>>
>>  > considered superior to the entirely TEI-conformant
>>  >
>>  > <milestone type="horseEnd" unit="q_start" n="foo"/>
>>  >
>>  > <milestone type="horseEnd" unit="q_end" n="foo"/>
>>
>>
>> a) it's shorter (pace XML spec)
>>
>> b) if you want me to be able to retrieve or process a span of this
>> type (in either format) in a sensible manner  I require an ID and a
>> matching IDREF, otherwise all bets are off.
>>
>>
>>  > given that the former requires a (semantically dubious)
>> modification to  special-case (some but not all) empty <q> elements?
>>
>>
>>  >
>>
>> True it's more verbose, but that's arguably why it's better: no magic
>> here.
>>
>> The latter is a matter of taste; I prefer explicit brevity. But I
>> would have thought that after 30 years we would have grokked the
>> utility of start—end markers for overlap and settled on a single
>> attribute to hold the IDREF of the terminator (given that xml:id is
>> already available on everything).
>>
>>
>>  > Or if you don't like combining two pieces of information in a
>> single attribute (@unit), why not make an conformant extension such as
>>  >
>>  > <horse:tag type="start" result="q" id="foo"/>
>>  >
>>  > <horse:tag type="end" result="q" id="foo"/>
>>
>>
>> Adding Yet Another Namespace simply makes a horse's rear end of the
>> whole thing.
>>
>>
>>  > Finding all the real or potential q elements in a single sweep
>> through  the document is going to be a bit more complex whatever
>> solution you adopt.
>>
>>
>> I suspect that XPath2/3 is well up to the task.
>>
>> As ever, your kilometrage may vary...
>>
>> ///Peter
>>
>> Written on a phone on board ship, so please excuse typos.
>>
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: trojan horse

Piotr Bański
... and, arguably, that would still fall under the clause of "legal
expression of the abstract model". An interesting idea.

    P.

On 10/26/17 18:17, Martin Holmes wrote:

> Perhaps we should define a standard TEI namespace for standard TEI
> elements being used in this abusive(?) manner, just as we have one for
> TEI elements being used as exemplars. Then it would be easy to
> distinguish conventional usage from HORSE usage, but the element local
> names would remain the same.
>
> Cheers,
> Martin
>
> On 2017-10-26 07:13 AM, Piotr Bański wrote:
>> Just a tidbit.
>> Lou, careful as always, says:
>>
>>  > If I find a <tag> in a document which is in the TEI
>>  > namespace, I expect that element to respect the semantics and syntax
>>  > defined by the TEI
>>
>> The crucial fragment is, naturally, "and syntax". In the case of
>> overlap, the syntax violation is arguably a result of the clash
>> between the Platonic TEI abstract model and the limitations of the
>> current TEI serialization format, namely TEI XML.
>>
>> HORSE, or its evolved or degenerate variants, attempt to maintain the
>> TEI-defined semantics at all costs, with mechanisms that cut across
>> the XML syntactic shackles. XML syntax is in this case not a virtue to
>> be cherished but at best a compromise to be suffered. Much as I am a
>> fan of tag abuse (in the positive sense...)[1], I can't even see this
>> as tag abuse. Rather as a rescue mechanism.
>>
>> In essence, this feels not like fight for the truth but rather as
>> confusion as to the world in which the truth conditions are to be
>> checked. I'm not saying I know the solution, I would just like to turn
>> attention to the plane at which the "conflict" appears to take place.
>>
>> Best,
>>
>>    Piotr
>>
>>
>> [1]: ...mostly...
>>
>>
>> On 10/26/17 15:30, Lou Burnard wrote:
>>> Hi Peter
>>>
>>> I'm on a eurostar rather than a boat, so I'll be brief.
>>>
>>> I think you're missing a rather crucial point of Michael's proposal:
>>> the linking between the two ends of the horse is effected by
>>> co-reference, not by ID/IDREF, nor by URI.
>>>
>>> The reason I say it's a bad idea to special-case <q/> (with
>>> appropriate attributes) is that it reduces my ability to introduce a
>>> genuinely empty <q> in a document without ambiguity.  But this is not
>>> an opinion every shares (see further earlier discussion on this list
>>> sv "tagging verse and quotes" or some such)
>>>
>>> Your objection to using a different namespace is quaint, but I think
>>> you'll find that train has already left the station. A different
>>> namespace is precisely what you need in order to make clear that your
>>> usage is not a TEI-specified one.  Why needlessly obscure the primary
>>> goal of all this effort, which is to make your coding comprehensible
>>> to someone else. If I find a <tag> in a document which is in the TEI
>>> namespace, I expect that element to respect the semantics and syntax
>>> defined by the TEI. If it doesn't it should be in some other namespace.
>>>
>>> Lou
>>>
>>> ------------------------------------------------------------------------
>>> *From:* TEI (Text Encoding Initiative) public discussion list
>>> [[hidden email]] on behalf of Peter Flynn
>>> [[hidden email]]
>>> *Sent:* 26 October 2017 12:30
>>> *To:* [hidden email]
>>> *Subject:* Re: Fwd: trojan horse
>>>
>>> On 26 October 2017 11:06:30 Lou Burnard
>>> <[hidden email]> wrote:
>>>
>>>  > I'll comment on Michael's actual questions in another post, but first
>>>  > I'd like to get this of my chest... why oh why oh why oh why is it
>>> felt
>>>  > to be so important that the element name used to mark the start or
>>> end a
>>>  > stretch of horse-delimited markup should have to be the same as
>>> the name
>>>  > of a regular content-containing element?
>>>
>>> As far as I can make out, because it means what it says: it's a
>>> quote. It's marked differently from the normal content-containing q
>>> but it's still a quote. Inventing a new name would be perverse, and
>>> IMNSHO would constitute pollution of the namespace.
>>>
>>> Flynn's Rule (which I just invented :-) says that variant
>>> applications of an element type should be marked in attributes, not
>>> by adding to the pool of element type names.
>>>
>>>  >
>>>
>>> Why abuse an existing element,
>>>
>>> If the span denoted by the proposed usage is a quote, this is not abuse.
>>>
>>>
>>>  > introduce ambiguity (sometimes a horse-end might actually be a real
>>>  > empty non-horse after all),
>>>
>>> I thought that was what the rules in schemas and Schematron were for:
>>> enforcing specific uses; eg if q has an @eID, and the matching
>>> @xml:id is on a preceding q, then content in both is disallowed.
>>>
>>>  >
>>>
>>> and complexify processing?
>>>
>>>
>>> The corollary of Flynn's Rule must surely be that additional
>>> processing is unavoidable — as it would be no matter which way you do
>>> it.
>>>
>>>
>>>  > Concretely, why is
>>>  >
>>>  > <q sID="foo"/> .... <q eID="foo"/>
>>>
>>>
>>> @sID is probably unnecessary: @xml:id should be adequate.
>>>
>>>
>>>  > considered superior to the entirely TEI-conformant
>>>  >
>>>  > <milestone type="horseEnd" unit="q_start" n="foo"/>
>>>  >
>>>  > <milestone type="horseEnd" unit="q_end" n="foo"/>
>>>
>>>
>>> a) it's shorter (pace XML spec)
>>>
>>> b) if you want me to be able to retrieve or process a span of this
>>> type (in either format) in a sensible manner  I require an ID and a
>>> matching IDREF, otherwise all bets are off.
>>>
>>>
>>>  > given that the former requires a (semantically dubious)
>>> modification to  special-case (some but not all) empty <q> elements?
>>>
>>>
>>>  >
>>>
>>> True it's more verbose, but that's arguably why it's better: no magic
>>> here.
>>>
>>> The latter is a matter of taste; I prefer explicit brevity. But I
>>> would have thought that after 30 years we would have grokked the
>>> utility of start—end markers for overlap and settled on a single
>>> attribute to hold the IDREF of the terminator (given that xml:id is
>>> already available on everything).
>>>
>>>
>>>  > Or if you don't like combining two pieces of information in a
>>> single attribute (@unit), why not make an conformant extension such as
>>>  >
>>>  > <horse:tag type="start" result="q" id="foo"/>
>>>  >
>>>  > <horse:tag type="end" result="q" id="foo"/>
>>>
>>>
>>> Adding Yet Another Namespace simply makes a horse's rear end of the
>>> whole thing.
>>>
>>>
>>>  > Finding all the real or potential q elements in a single sweep
>>> through  the document is going to be a bit more complex whatever
>>> solution you adopt.
>>>
>>>
>>> I suspect that XPath2/3 is well up to the task.
>>>
>>> As ever, your kilometrage may vary...
>>>
>>> ///Peter
>>>
>>> Written on a phone on board ship, so please excuse typos.
>>>
Reply | Threaded
Open this post in threaded view
|

Re: trojan horse

C. M. Sperberg-McQueen
In reply to this post by Lou Burnard-6
> On Oct 26, 2017, at 7:30 AM, Lou Burnard <[hidden email]> wrote:
>
> Hi Peter
>
> I'm on a eurostar rather than a boat, so I'll be brief.
>
> I think you're missing a rather crucial point of Michael's proposal: the linking between the two ends of the horse is effected by co-reference, not by ID/IDREF, nor by URI.  

Once you’re off the train, can you expound on this difference?

Whenever I have thought about the issue, my assumption over the
years has been that the simplest way to get cheap validation for the
sID/eID pairs is to declare one of them as ID and the other as IDREF.
Ad hoc validation (e.g. with Schematron) would be necessary to
check ordering and forbid other IDREFs pointing to the sID (assuming
one decided one wanted to forbid that).

It is true, however, that sID and eID could be co-indexed without
any schema-level support.  (And also true that the design in DeRose
2004 uses sID rather than id or xml:id for a reason:  if one uses
an already existent attribute for the ID, one needs to add some other
signal that this element is a Trojan Horse start-tag, not a
conventional sole-tag.   So if what you meant was that Peter Flynn’s
proposal to drop @sID in favor of @xml:id was a non-starter, I think
I’m with you.)
 
>
> The reason I say it's a bad idea to special-case <q/> (with appropriate attributes) is that it reduces my ability to introduce a genuinely empty <q> in a document without ambiguity.  

I don’t see the ambiguity:  your genuinely empty ‘q’ element cannot
validly carry an @sID attribute.  (Yes, this constraint leaves the expressive
power of DTDs and Relax NG and XSD 1.0 behind; one needs
Schematron or XSD 1.1 assertions, or application-level checking.
Life is hard sometimes.)

> But this is not an opinion every shares (see further earlier discussion on this list sv "tagging verse and quotes" or some such)
>
> Your objection to using a different namespace is quaint, but I think you'll find that train has already left the station. A different namespace is precisely what you need in order to make clear that your usage is not a TEI-specified one.  

Doesn’t documentation do that?  (I ask at the risk of being called quaint,
in one or more of its various senses.)

> Why needlessly obscure the primary goal of all this effort, which is to make your coding comprehensible to someone else. If I find a <tag> in a document which is in the TEI namespace, I expect that element to respect the semantics and syntax defined by the TEI. If it doesn't it should be in some other namespace.  

This would be a stronger argument if it were clearer what parts of the
TEI syntax and semantics count as things that must be preserved and
which part are subject to change, in which ways.  It seems clear to me
that whether Trojan Horse markup is held to preserve or to violate the
semantics of ‘q’ depends very much on whether we think the semantics
apply to q only when used as a conventional XML element or also apply
to virtual elements marked by other means.  

If we hold that the semantics of ‘q’ apply only to XML elements, and not to
‘logical’ or ‘virtual’ elements constructed by other means (possibly from a
collection of nodes in an XML document), then you are on safe ground
with the claim that Trojan Horse markup using tei:q constitutes semantic
abuse.  The description of ‘q’ does, after all, start with the phrase “contains
material which is …"

But if we take that use of ‘contains’ as referring to the XML element, and
not the textual feature, the what are the semantics of the milestone tags
with which you proposed to tag direct discouse when ‘q’ cannot be used
because the direct discourse overlaps some other structure?  What are
the semantics of fragmented elements (with @part of @next and @prev)
and the logical wholes they represent?  What does <join result=“q” …/>
mean?  Is it legitimate to use join result=“q” to identiy a quire in a book,
because we like the brevity of ‘q’ for ‘quire’?

We might hold on the other hand that the TEI representation of a quotation
fragmented across multiple q elements, or constructed from whole cloth
by join or milestones (leaving aside the semantic issues of using milestone
when the text is not tessellated) are (or should be, in a conforming usage)
those of a ‘q’ element with the indicated content and a specifiable set of
attribute value pairs.  But in that case, we must apparently contemplate the
idea the the semantics of ‘q’ do not in themselves entail the representation
of the logical structure by means of a single XML element.  And in that
case, it’s hard to see how you can make stick your argument that Trojan
Horse usage of ‘tei:q’ elements is semantically erroneous.

best,

Michael


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************
Reply | Threaded
Open this post in threaded view
|

Re: trojan horse

Lou Burnard-6
Thanks for providing what I must concede is a knock-down answer to my
question : the requirement to add further q-specific attributes to the
empty horse tag (specifically the @who in <q sID="foo" who="#MSM"/>) is
entirely reasonable, and hard to do with the generic milestone
alternative. Though I wonder whether that attribute should not also
logically be provided on  the  corresponding @eID tag.

Your other comments are well taken: I had indeed always assumed that by
"coreference" you  just meant "using the same value", with no specific
schema validation mechanism such as ID/IDREF, since the available
mechanisms all need to be complemented by ad hoc validation of some
kind. I may also have been disinclined to the use @xml:id by the fact
that TEI P5 (in a rush of enthusiasm some time ago) transformed all
existing IDREF-valued attributes to permit anyURI, which slightly
complicates things like this. But for whatever reason, we agree that
Peter Flynn's suggestion is a nonstarter.

My discomfort with the idea of an empty <tei:q/> (or more generally any
other element), existing solely as an artefact of the markup or
conventionalised to mean something other than its obvious XML
significance ("here's a q which has no content") is not dispelled by
your other points, but I agree that I have not yet found a persuasive
counter argument; and, as I said, other reasonable people have already
disagreed with me on the matter.


You ask:

"What are the semantics of fragmented elements (with @part of @next and @prev)
and the logical wholes they represent? "

I think in this case the defined semantics are preserved. The presence
of attributes @part, or @next and @prev, serves to say modify only the
notion of "completeness" implied by a start and end tag. So in e.g. a
passage of alexandrines one of which is split between two speakers, an
<l part="i"> is just an <l> which happens to be defective or incomplete.
This notion is so familiar and intuitive, I don't find any problem with
it. A defective alexandrine is still an alexandrine, more or less. If
one of my legs drops off, I don't stop being notionally a biped. Whereas
I am not sure that a tag which says "I am where a FOO might begin or
end" is really saying the same thing as "I am a FOO": certainly (as
Piotr remarks) it's a statement at a different level of discourse.

"What does <join result=“q” …/>
mean?  Is it legitimate to use join result=“q” to identiy a quire in a book,
because we like the brevity of ‘q’ for ‘quire’?"


On the other hand, I've never been entirely convinced about this
particular piece of bricolage. I think the spec says somewhere that this
element has to appear only in a place where it would be legitimate to
put an occurrence of the @result element, but provides no clue as to how
on earth one is supposed to validate that . To answer your second
question though, presumably the rules for what is a valid @result value
say that it should be the GI of an element in the current schema, and
thus probably the answer is no, unless you have defined "my:q" to mean
quire (in which case it should be result="my:q")





On 26/10/17 18:28, C. M. Sperberg-McQueen wrote:

>> On Oct 26, 2017, at 7:30 AM, Lou Burnard <[hidden email]> wrote:
>>
>> Hi Peter
>>
>> I'm on a eurostar rather than a boat, so I'll be brief.
>>
>> I think you're missing a rather crucial point of Michael's proposal: the linking between the two ends of the horse is effected by co-reference, not by ID/IDREF, nor by URI.
> Once you’re off the train, can you expound on this difference?
>
> Whenever I have thought about the issue, my assumption over the
> years has been that the simplest way to get cheap validation for the
> sID/eID pairs is to declare one of them as ID and the other as IDREF.
> Ad hoc validation (e.g. with Schematron) would be necessary to
> check ordering and forbid other IDREFs pointing to the sID (assuming
> one decided one wanted to forbid that).
>
> It is true, however, that sID and eID could be co-indexed without
> any schema-level support.  (And also true that the design in DeRose
> 2004 uses sID rather than id or xml:id for a reason:  if one uses
> an already existent attribute for the ID, one needs to add some other
> signal that this element is a Trojan Horse start-tag, not a
> conventional sole-tag.   So if what you meant was that Peter Flynn’s
> proposal to drop @sID in favor of @xml:id was a non-starter, I think
> I’m with you.)
>  
>> The reason I say it's a bad idea to special-case <q/> (with appropriate attributes) is that it reduces my ability to introduce a genuinely empty <q> in a document without ambiguity.
> I don’t see the ambiguity:  your genuinely empty ‘q’ element cannot
> validly carry an @sID attribute.  (Yes, this constraint leaves the expressive
> power of DTDs and Relax NG and XSD 1.0 behind; one needs
> Schematron or XSD 1.1 assertions, or application-level checking.
> Life is hard sometimes.)
>
>> But this is not an opinion every shares (see further earlier discussion on this list sv "tagging verse and quotes" or some such)
>>
>> Your objection to using a different namespace is quaint, but I think you'll find that train has already left the station. A different namespace is precisely what you need in order to make clear that your usage is not a TEI-specified one.
> Doesn’t documentation do that?  (I ask at the risk of being called quaint,
> in one or more of its various senses.)
>
>> Why needlessly obscure the primary goal of all this effort, which is to make your coding comprehensible to someone else. If I find a <tag> in a document which is in the TEI namespace, I expect that element to respect the semantics and syntax defined by the TEI. If it doesn't it should be in some other namespace.
> This would be a stronger argument if it were clearer what parts of the
> TEI syntax and semantics count as things that must be preserved and
> which part are subject to change, in which ways.  It seems clear to me
> that whether Trojan Horse markup is held to preserve or to violate the
> semantics of ‘q’ depends very much on whether we think the semantics
> apply to q only when used as a conventional XML element or also apply
> to virtual elements marked by other means.
>
> If we hold that the semantics of ‘q’ apply only to XML elements, and not to
> ‘logical’ or ‘virtual’ elements constructed by other means (possibly from a
> collection of nodes in an XML document), then you are on safe ground
> with the claim that Trojan Horse markup using tei:q constitutes semantic
> abuse.  The description of ‘q’ does, after all, start with the phrase “contains
> material which is …"
>
> But if we take that use of ‘contains’ as referring to the XML element, and
> not the textual feature, the what are the semantics of the milestone tags
> with which you proposed to tag direct discouse when ‘q’ cannot be used
> because the direct discourse overlaps some other structure?  What are
> the semantics of fragmented elements (with @part of @next and @prev)
> and the logical wholes they represent?  What does <join result=“q” …/>
> mean?  Is it legitimate to use join result=“q” to identiy a quire in a book,
> because we like the brevity of ‘q’ for ‘quire’?
>
> We might hold on the other hand that the TEI representation of a quotation
> fragmented across multiple q elements, or constructed from whole cloth
> by join or milestones (leaving aside the semantic issues of using milestone
> when the text is not tessellated) are (or should be, in a conforming usage)
> those of a ‘q’ element with the indicated content and a specifiable set of
> attribute value pairs.  But in that case, we must apparently contemplate the
> idea the the semantics of ‘q’ do not in themselves entail the representation
> of the logical structure by means of a single XML element.  And in that
> case, it’s hard to see how you can make stick your argument that Trojan
> Horse usage of ‘tei:q’ elements is semantically erroneous.
>
> best,
>
> Michael
>
>
> ********************************************
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> [hidden email]
> http://www.blackmesatech.com
> ********************************************
Reply | Threaded
Open this post in threaded view
|

Re: trojan horse

Peter Stadler
– a very interesting topic –
Just briefly, I’m with Lou in hesitating to allow all sorts of empty elements around but find Piotr’s analysis as "the clash between the Platonic TEI abstract model and the limitations of the current TEI serialization format“ and Martin’s solution of a separate namespace very convincing!

Best
Peter

> Am 27.10.2017 um 11:48 schrieb Lou Burnard <[hidden email]>:
>
> Thanks for providing what I must concede is a knock-down answer to my question : the requirement to add further q-specific attributes to the empty horse tag (specifically the @who in <q sID="foo" who="#MSM"/>) is entirely reasonable, and hard to do with the generic milestone alternative. Though I wonder whether that attribute should not also logically be provided on  the  corresponding @eID tag.
>
> Your other comments are well taken: I had indeed always assumed that by "coreference" you  just meant "using the same value", with no specific schema validation mechanism such as ID/IDREF, since the available mechanisms all need to be complemented by ad hoc validation of some kind. I may also have been disinclined to the use @xml:id by the fact that TEI P5 (in a rush of enthusiasm some time ago) transformed all existing IDREF-valued attributes to permit anyURI, which slightly complicates things like this. But for whatever reason, we agree that Peter Flynn's suggestion is a nonstarter.
>
> My discomfort with the idea of an empty <tei:q/> (or more generally any other element), existing solely as an artefact of the markup or conventionalised to mean something other than its obvious XML significance ("here's a q which has no content") is not dispelled by your other points, but I agree that I have not yet found a persuasive counter argument; and, as I said, other reasonable people have already disagreed with me on the matter.
>
>
> You ask:
>
> "What are the semantics of fragmented elements (with @part of @next and @prev)
> and the logical wholes they represent? "
>
> I think in this case the defined semantics are preserved. The presence of attributes @part, or @next and @prev, serves to say modify only the notion of "completeness" implied by a start and end tag. So in e.g. a passage of alexandrines one of which is split between two speakers, an <l part="i"> is just an <l> which happens to be defective or incomplete. This notion is so familiar and intuitive, I don't find any problem with it. A defective alexandrine is still an alexandrine, more or less. If one of my legs drops off, I don't stop being notionally a biped. Whereas I am not sure that a tag which says "I am where a FOO might begin or end" is really saying the same thing as "I am a FOO": certainly (as Piotr remarks) it's a statement at a different level of discourse.
>
> "What does <join result=“q” …/>
> mean?  Is it legitimate to use join result=“q” to identiy a quire in a book,
> because we like the brevity of ‘q’ for ‘quire’?"
>
>
> On the other hand, I've never been entirely convinced about this particular piece of bricolage. I think the spec says somewhere that this element has to appear only in a place where it would be legitimate to put an occurrence of the @result element, but provides no clue as to how on earth one is supposed to validate that . To answer your second question though, presumably the rules for what is a valid @result value say that it should be the GI of an element in the current schema, and thus probably the answer is no, unless you have defined "my:q" to mean quire (in which case it should be result="my:q")
>
>
>
>
>
> On 26/10/17 18:28, C. M. Sperberg-McQueen wrote:
>>> On Oct 26, 2017, at 7:30 AM, Lou Burnard <[hidden email]> wrote:
>>>
>>> Hi Peter
>>>
>>> I'm on a eurostar rather than a boat, so I'll be brief.
>>>
>>> I think you're missing a rather crucial point of Michael's proposal: the linking between the two ends of the horse is effected by co-reference, not by ID/IDREF, nor by URI.
>> Once you’re off the train, can you expound on this difference?
>>
>> Whenever I have thought about the issue, my assumption over the
>> years has been that the simplest way to get cheap validation for the
>> sID/eID pairs is to declare one of them as ID and the other as IDREF.
>> Ad hoc validation (e.g. with Schematron) would be necessary to
>> check ordering and forbid other IDREFs pointing to the sID (assuming
>> one decided one wanted to forbid that).
>>
>> It is true, however, that sID and eID could be co-indexed without
>> any schema-level support.  (And also true that the design in DeRose
>> 2004 uses sID rather than id or xml:id for a reason:  if one uses
>> an already existent attribute for the ID, one needs to add some other
>> signal that this element is a Trojan Horse start-tag, not a
>> conventional sole-tag.   So if what you meant was that Peter Flynn’s
>> proposal to drop @sID in favor of @xml:id was a non-starter, I think
>> I’m with you.)
>>
>>> The reason I say it's a bad idea to special-case <q/> (with appropriate attributes) is that it reduces my ability to introduce a genuinely empty <q> in a document without ambiguity.
>> I don’t see the ambiguity:  your genuinely empty ‘q’ element cannot
>> validly carry an @sID attribute.  (Yes, this constraint leaves the expressive
>> power of DTDs and Relax NG and XSD 1.0 behind; one needs
>> Schematron or XSD 1.1 assertions, or application-level checking.
>> Life is hard sometimes.)
>>
>>> But this is not an opinion every shares (see further earlier discussion on this list sv "tagging verse and quotes" or some such)
>>>
>>> Your objection to using a different namespace is quaint, but I think you'll find that train has already left the station. A different namespace is precisely what you need in order to make clear that your usage is not a TEI-specified one.
>> Doesn’t documentation do that?  (I ask at the risk of being called quaint,
>> in one or more of its various senses.)
>>
>>> Why needlessly obscure the primary goal of all this effort, which is to make your coding comprehensible to someone else. If I find a <tag> in a document which is in the TEI namespace, I expect that element to respect the semantics and syntax defined by the TEI. If it doesn't it should be in some other namespace.
>> This would be a stronger argument if it were clearer what parts of the
>> TEI syntax and semantics count as things that must be preserved and
>> which part are subject to change, in which ways.  It seems clear to me
>> that whether Trojan Horse markup is held to preserve or to violate the
>> semantics of ‘q’ depends very much on whether we think the semantics
>> apply to q only when used as a conventional XML element or also apply
>> to virtual elements marked by other means.
>>
>> If we hold that the semantics of ‘q’ apply only to XML elements, and not to
>> ‘logical’ or ‘virtual’ elements constructed by other means (possibly from a
>> collection of nodes in an XML document), then you are on safe ground
>> with the claim that Trojan Horse markup using tei:q constitutes semantic
>> abuse.  The description of ‘q’ does, after all, start with the phrase “contains
>> material which is …"
>>
>> But if we take that use of ‘contains’ as referring to the XML element, and
>> not the textual feature, the what are the semantics of the milestone tags
>> with which you proposed to tag direct discouse when ‘q’ cannot be used
>> because the direct discourse overlaps some other structure?  What are
>> the semantics of fragmented elements (with @part of @next and @prev)
>> and the logical wholes they represent?  What does <join result=“q” …/>
>> mean?  Is it legitimate to use join result=“q” to identiy a quire in a book,
>> because we like the brevity of ‘q’ for ‘quire’?
>>
>> We might hold on the other hand that the TEI representation of a quotation
>> fragmented across multiple q elements, or constructed from whole cloth
>> by join or milestones (leaving aside the semantic issues of using milestone
>> when the text is not tessellated) are (or should be, in a conforming usage)
>> those of a ‘q’ element with the indicated content and a specifiable set of
>> attribute value pairs.  But in that case, we must apparently contemplate the
>> idea the the semantics of ‘q’ do not in themselves entail the representation
>> of the logical structure by means of a single XML element.  And in that
>> case, it’s hard to see how you can make stick your argument that Trojan
>> Horse usage of ‘tei:q’ elements is semantically erroneous.
>>
>> best,
>>
>> Michael
>>
>>
>> ********************************************
>> C. M. Sperberg-McQueen
>> Black Mesa Technologies LLC
>> [hidden email]
>> http://www.blackmesatech.com
>> ********************************************
Reply | Threaded
Open this post in threaded view
|

Re: trojan horse

Syd Bauman-10
In reply to this post by C. M. Sperberg-McQueen
Not addressing the whole thread now, just two little bits.

Michael -- have you looked at XCONCUR? I thought Oliver and Andreas
were working on extraction of each hierarchy from the source XCONCUR
document, which would do you nicely, I think.


CMSMcQ> Question 2: do any readers of this list have (or know of)
CMSMcQ> tools for this kind of operations on schemas or ODD
CMSMcQ> documents?

See
http://conferences.idealliance.org/extreme/html/2005/Bauman01/EML2005Bauman01.html.
Figure 3 supplies 8 or so lines of Perl that would convert an element
declaration in a standard TEI RELAX NG compact syntax schema from 2005
into HORSE. This particular code won't come close to working anymore,
as standard TEI RELAXNG schemas are quite different nowadays. But the
idea may prove helpful to you.


BTW, I have long thought that it would be nice if ODD processors
produced HORSE-ready schemas. I think it would be possible, but have
never actually tried a formal proof or even proof-of-concept code.


CMSMcQ> genuinely empty ‘q’ element cannot validly carry an @sID
CMSMcQ> attribute. (Yes, this constraint leaves the expressive power
CMSMcQ> of DTDs and Relax NG and XSD 1.0 behind; one needs Schematron
CMSMcQ> or XSD 1.1 assertions, or application-level checking. Life is
CMSMcQ> hard sometimes.)

Not sure what you mean, here. Certainly RELAX NG (and probably XSD)
can easily say things like "an empty <q> w/o @sID or @eID is allowed
here, but an empty <q> with @sID or @eID is not". But in truth, you
probably want them allowed in (mostly) the same places, in which case
even rule-based assertions are probably not going to help. A human is
going to have to check to see whether "nothing was quoted" was meant
or "start of something quoted" was meant. This doesn't bother me in
the slightest. Humans also have to check if "<q>" or "<said>" or
"<quote>" or "<mentioned>" was meant, too.
Reply | Threaded
Open this post in threaded view
|

Re: trojan horse

C. M. Sperberg-McQueen
> On Oct 27, 2017, at 2:51 PM, Syd Bauman <[hidden email]> wrote:
> ...
>
> CMSMcQ> genuinely empty ‘q’ element cannot validly carry an @sID
> CMSMcQ> attribute. (Yes, this constraint leaves the expressive power
> CMSMcQ> of DTDs and Relax NG and XSD 1.0 behind; one needs Schematron
> CMSMcQ> or XSD 1.1 assertions, or application-level checking. Life is
> CMSMcQ> hard sometimes.)
>
> Not sure what you mean, here. Certainly RELAX NG (and probably XSD)
> can easily say things like "an empty <q> w/o @sID or @eID is allowed
> here, but an empty <q> with @sID or @eID is not". But in truth, you
> probably want them allowed in (mostly) the same places, in which case
> even rule-based assertions are probably not going to help.

For the record, no, I was not concerned about the contexts in which
this or that form of the q element might be allowed or forbidden to appear:
one of the large advantages of Trojan Horse markup in the cases illustrated
in Steve DeRose’s original paper is that no change to the parent content
models is needed, assuming that the beginning or ending of quoted material
should be valid in all and only those places where quoted material may
validly appear.   I think that’s a reasonable base assumption and I would
be puzzled if it did not hold; if there is some reason for it to hold, then
that would seem on its face to be an argument that there are two distinct
kinds of quoted material here, which might better be given distinct generic
identifiers.

Looking at what I wrote, it’s no longer clear to me exactly what I had in
mind.   I may possibly have meant that the absence of a matching
@eID should make the presence of @sID invalid (just as the absence
of a matching @sID makes an @eID and the element carrying it invalid);
I don’t see a way to enforce this constraint with DTDs, Relax NG, or
XSD 1.0, although I may be overlooking something obvious.  

I may possibly have meant the entire complex of validity conditions like
the rule that in valid instances the presence of @sID entails the absence
of content and of @eID, and the presence of @eID entails the
absence of all content and all other attributes.  In this case, I was thinking
sloppily, because such rules do seem exrpessible in RNG, though
not (as far as I have yet noticed) in DTDs or XSD 1.0.

I don’t think I had in mind the principle that you seem to appeal to, namely
that no mechanical test can check to see what was humanly intended.
That is true, but not specific to any particular markup construct or
schema language.


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: trojan horse

Peter Flynn-8
In reply to this post by Lou Burnard-6
On 26/10/17 14:30, Lou Burnard wrote:
> Hi Peter
> I'm on a eurostar rather than a boat, so I'll be brief.

I was delayed for unrelated reasons :-)

> I think you're missing a rather crucial point of Michael's proposal: the
> linking between the two ends of the horse is effected by co-reference,
> not by ID/IDREF, nor by URI.

I'm sorry, I was less than explicit. I realise that; I just think it's A
Bad Idea™.

> The reason I say it's a bad idea to special-case <q/> (with appropriate
> attributes) is that it reduces my ability to introduce a genuinely empty
> <q> in a document without ambiguity.

It makes it very slightly more complex, perhaps.

> Your objection to using a different namespace is quaint, but I think
> you'll find that train has already left the station.

The objection is not to using an additional namespace, but to using one
for this purpose. Having to drag in a namespace merely to handle overlap
seems to me overkill.

> A different
> namespace is precisely what you need in order to make clear that your
> usage is not a TEI-specified one.

I was hoping that we could achieve a solution without going outside the
bounds of the TEI specification.

> Why needlessly obscure the primary
> goal of all this effort, which is to make your coding comprehensible to
> someone else.

Comprehensibility is good, but I'm unconvinced that making the coding
more complex than it need be contributes to it.

> If I find a <tag> in a document which is in the TEI
> namespace, I expect that element to respect the semantics and syntax
> defined by the TEI.

Amen. Here endeth the first Lesson.

///Peter
Reply | Threaded
Open this post in threaded view
|

Re: trojan horse

Peter Flynn-8
In reply to this post by C. M. Sperberg-McQueen
On 26/10/17 18:28, C. M. Sperberg-McQueen wrote:
> [...] And also true that the design in DeRose 2004 uses sID rather
> than id or xml:id for a reason:  if one uses an already existent
> attribute for the ID, one needs to add some other signal that this
> element is a Trojan Horse start-tag, not a conventional sole-tag.
I was hoping, perhaps in vain, that a q with an @xml:id for which there
exists a later q with an @eID that matches the @xml:id is sufficient
grounds for detecting this. That allows as many solo qs with @xml:id
(and no matching later @sIDs) as you want.

Yes, it adds complexity. But you're going to get that in any solution to
overlap.

> Life is hard sometimes.

:-)

///Peter