Quantcast

Re: TEI-L Digest - 3 May 2017 to 4 May 2017 (#2017-101)

classic Classic list List threaded Threaded
2 messages Options
MLH
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TEI-L Digest - 3 May 2017 to 4 May 2017 (#2017-101)

MLH

Dear Emmanuelle,

My understanding is that the two numbers in @writtenLines (also @ruledLines), is that the first number gives the minimum number of lines per page or column in the codex as a whole, and the second gives the maximum number of lines per page or column in the codex as a whole.

e.g.

columns="2" writtenLines="20 30"

would not mean that column a had 20 lines and column b 30 lines, but that the codex as a whole had 2 columns per page and between 20 and 30 lines per column throughout.

similarly

columns="1" writtenLines="20 30"

would mean a codex in long lines with between 20 and 30 lines per page throughout.

Best wishes,

Matthew




From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of TEI-L automatic digest system <[hidden email]>
Sent: 05 May 2017 04:00
To: [hidden email]
Subject: TEI-L Digest - 3 May 2017 to 4 May 2017 (#2017-101)
 
There are 10 messages totaling 2938 lines in this issue.

Topics of the day:

  1. "May contain: Empty element"
  2. question about attribute @writtenLines (metadata)
  3. @xml:base with @rendition (and maybe other pointers) (8)

----------------------------------------------------------------------

Date:    Thu, 4 May 2017 11:28:58 +0100
From:    Lou Burnard <[hidden email]>
Subject: Re: "May contain: Empty element"

On 03/05/17 16:19, John P. McCaskey wrote:
> When the spec for an element says “May contain: Empty element“ that
> does not actually mean the spec’d element can contain any other
> element as long as the contained one is empty, right? It actually
> means “May contain: No other element,“ right?
>
> Should that be changed?
>
>

The "may contain" part is boiler plate text provided by the stylesheet
in various languages: it might be tricky to change it. The other part is
meant to imply that the element in question cannot contain anything,
i.e. it is empty. Since there is now a proposal in the works (see
https://github.com/TEIC/TEI/issues/1596)  to allow for an empty content

model to be represented in an ODD by using an explicit element called
"<empty/>"   that text maybe needs to be revisited. You might like to
raise a Stylesheets ticket to remind someone to review it.


------------------------------

Date:    Thu, 4 May 2017 12:45:53 +0200
From:    Emmanuelle Morlock <[hidden email]>
Subject: question about attribute @writtenLines (metadata)

Dear list,

I was just wondering why the attribute @writtenLines on layout can contain either one or two numbers (to represent the number of lines of one or two columns), but not more.

My question is mainly out of curiosity, I don’t have any use case, but it just seems slightly odd (the I may assume that in the manuscript world it’s very rare to have more than two columns…) but what if ? How would you do if you had more than two columns ? 

thanks !
Best, 
-- 
Emmanuelle Morlock
IE CNRS - Humanités numériques & TEI (Text Encoding Initiative)
UMR 5189 HISoMA (Histoire et Sources des Mondes antiques) - Lyon
http://www.hisoma.mom.fr/annuaire/morlock-emmanuelle

06 85 84 69 16
@emma_morlock

------------------------------

Date:    Thu, 4 May 2017 10:16:19 -0400
From:    "John P. McCaskey" <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

I asked about this over at XML-DEV,
http://lists.xml.org/archives/xml-dev/201705/msg00008.html.

Opinion there is with me and opposite the majority here.

Readers there don’t seem to think there is anything to debate. I got one
short answer and one dismissive comment about ”questions that simple.”
Someone take a look and be sure I didn’t word my question unfairly.

Whichever interpretation TEI adopts, sounds like it should be documented
in the Guidelines or a note somewhere.

If the McCaskey/XML-DEV interpretation is adopted, the GitHub issue I
posted about @rendition stands. If not, that issue goes away.

John



On 5/3/2017 8:03 PM, Hugh Cayless wrote:
> Those are the cards we've been dealt, yeah.
>
> Sent from my phone.
>
> On May 3, 2017, at 19:53, John P. McCaskey <[hidden email]
> <[hidden email]>> wrote:
>
>> So, bottom line: A standalone fragment identifier refers to the
>> loaded document and all the xml:base values above it in the hierarchy
>> are irrelevant. Encoders cannot use xml:base to direct a standalone
>> #fragment value to a location outside the loaded document.
>>
>> Is that right?
>>
>> --
>>
>>
>> On 5/3/2017 6:20 PM, Hugh Cayless wrote:
>>> No. This clarifies the expected behavior when (e.g.) you have
>>> @xml:base="#frag". It says nothing whatever about <p
>>> rendition="#foo">. I don't think these specifications interact in
>>> the way you're positing. Quite the opposite.
>>>
>>> On Wed, May 3, 2017 at 6:13 PM, John P. McCaskey
>>> <[hidden email] <[hidden email]>> wrote:
>>>
>>>     The special behavior of same-document references in RFC 3986 is
>>>     disavowed by W3C:
>>>
>>>         4.4 Interpretation of same-document references
>>>
>>>         RFC 3986 defines certain relative URI references, in
>>>         particular the empty string and those of the form #fragment,
>>>         as same-document references. Dereferencing of same-document
>>>         references is handled specially. However, their use as the
>>>         value of an *xml:base attribute does not* *involve
>>>         *dereferencing, and XML Base processors should resolve them
>>>         in the usual way. In particular, xml:base="" does not reset
>>>         the base URI to that of the containing document.
>>>
>>>         Note:
>>>
>>>         Some existing processors do treat these xml:base values as
>>>         resetting the base URI to that of the containing document,
>>>         so the use of such values is strongly discouraged.
>>>
>>>     This says:
>>>
>>>         RFC 3986 defines special “dereferencing” of empty strings
>>>         and #fragments. But over here in XML-land, we don’t do
>>>         “dereferencing.“ That’s not a word we use here. Ignore that
>>>         stuff about same-document references. Just resolve empty
>>>         strings and #fragments as specified in the W3C
>>>         Recommendation above. And those of you who did carry that
>>>         stuff over from 3986 to XML-land, shame on you. You messed
>>>         things up for the rest of us.
>>>
>>>
>>>     A note has:
>>>
>>>         5. The meanings of xml:base="" and xml:base="#frag" have
>>>         been clarified;
>>>
>>>     This says:
>>>
>>>         In this second version of this recommendation, we added
>>>         paragraph 4.4 specifically to get you 3986 people to stop
>>>         polluting our W3C with your special cases. Stop doing that.
>>>
>>>     No?
>>>
>>>     -- John
>>>
>>>
>>>
>>>
>>>
>>>
>>>     On 5/3/2017 5:09 PM, Hugh Cayless wrote:
>>>>     That's exactly what it does. It's just that the behavior of
>>>>     same-document references is prescribed in such a way that they
>>>>     end up resolving to the current document regardless of the
>>>>     value of the @xml:base.
>>>>
>>>>     Put another way, @xml:base has an influence on what the client
>>>>     application will retrieve when it dereferences a URI and
>>>>     retrieves the referenced document, but in the case of
>>>>     same-document references no such retrieval is expected to
>>>>     occur—it's assumed the client already has the document.
>>>>
>>>>     On Wed, May 3, 2017 at 4:29 PM, John P. McCaskey
>>>>     <[hidden email] <[hidden email]>> wrote:
>>>>
>>>>         Oh. I thought it obvious they all resolved the same.
>>>>
>>>>         I think the same about this one. You think otherwise?
>>>>
>>>>         <div xml:base="http://www.myteiproject.com/"
>>>>         <http://www.myteiproject.com/> <div xml:base="images/">
>>>>         <div> <graphic url="logo.jpg"> </div> <div
>>>>         xml:base="http://www.dictionary.com/words/
>>>>         <http://dictionary.com/words/>"> <p xml:base="a.html"> <ref
>>>>         target="#apple">apple</ref> </p> </div> </div> </div>
>>>>
>>>>         I didn’t think xml:base had any necessary relation to the
>>>>         current document. I thought in XML (not HTML), it just sets
>>>>         a path for URIs below it.
>>>>
>>>>         --
>>>>
>>>>         On 5/3/2017 3:42 PM, Hugh Cayless wrote:
>>>>>         Well, we don't know about the third one, except that it
>>>>>         points to whatever element *in the same document* has the
>>>>>         @xml:id "apple". When an application attempts to
>>>>>         dereference it, it should make the URI in @ref absolute
>>>>>         and check it against the base
>>>>>         (http://www.dictionary.com/words/a.html
>>>>>         <http://www.dictionary.com/words/a.html>), discover they
>>>>>         are identical, decide it doesn't need to fetch anything,
>>>>>         and go looking in the current document for the element
>>>>>         with the id "apple".
>>>>>         On Wed, May 3, 2017 at 3:27 PM, John P. McCaskey
>>>>>         <[hidden email]
>>>>>         <[hidden email]>> wrote:
>>>>>
>>>>>             Are people proposing that these targets do not all
>>>>>             resolve to the same
>>>>>             http://www.dictionary.com/words/a.html#apple
>>>>>             <http://www.dictionary.com/words/a.html#apple>? <div
>>>>>             xml:base="http://www.dictionary.com/"
>>>>>             <http://www.dictionary.com/>> <p xml:base="words/">
>>>>>             <ref target="a.html#apple">apple</ref> </p> </div>
>>>>>             <div xml:base="http://www.dictionary.com/words/"
>>>>>             <http://www.dictionary.com/words/>> <p
>>>>>             xml:base="a.html"> <ref target="#apple">apple</ref>
>>>>>             </p> </div> <div
>>>>>             xml:base="http://www.dictionary.com/words/a.html"
>>>>>             <http://www.dictionary.com/words/a.html>> <p> <ref
>>>>>             target="#apple">apple</ref> </p> </div> <div
>>>>>             xml:base="http://www.dictionary.com/"
>>>>>             <http://www.dictionary.com/>> <p> <ref
>>>>>             target="words/a.html#apple">apple</ref> </p> </div>--
>>>>>

------------------------------

Date:    Thu, 4 May 2017 11:37:19 -0600
From:    "C. M. Sperberg-McQueen" <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

> On May 3, 2017, at 5:53 PM, John P. McCaskey <[hidden email]> wrote:
>
> So, bottom line: A standalone fragment identifier refers to the loaded document and all the xml:base values above it in the hierarchy are irrelevant. Encoders cannot use xml:base to direct a standalone #fragment value to a location outside the loaded document.
>
> Is that right?

Yes and no.  For some purposes, the ways in which the answer is 'no'
are pedantic and can be ignored; for others they seem important.
I did not respond to your summary yesterday, because objecting to
the wording you used seemed unnecessarily pedantic.  Also, I
overlooked the second sentence, which I think is the wrong conclusion
to draw.

Given the following fragment of resource http://example.org/eg.xml

  <div xml:base="http://www.dictionary.com/a.html">
    <p>
      <ref target="#apple">Apple</ref>
      <ref target="a.html#avocado">Avocado</ref>
      <ref target="http://www.dictionary.com/a.html#anise">Anise</ref>
    </p>
  </div>

we can consider two ways of interpreting the target attributes.

Note that the discussion below ignores some possibly salient facts:

- URIs can denote different resources at different moments.
    (The discussion assumes the URI-resource mapping is not changing.)
   
  - A given resource can have multiple representations.  (The
    discussion ignores any resulting complications.)
   
  - URIs whose path component ends in .xml and .html do not
     necessarily have particular MIME types, so there is no guarantee
     that a fragment identifier like #apple will have similar
     meanings.  (The discussion assumes that fragment identifiers
     point to elements assigned IDs by the HTML 'id' attribute
     and/or the xml:id attribute.)    

Interpretation 1 ('xml:base values are irrelevant').  Ignore xml:base
when resolving '#apple' [but not when resolving other relative
references].  The target attributes are interpreted as denoting

    (a) http://example.org/eg.xml#apple
    (b) http://www.dictionary.com/a.html#avocado
    (c) http://www.dictionary.com/a.html#anise

None of these have anything to do with any of:

    (d) http://www.dictionary.com/a.html#apple
    (e) http://example.org/eg.xml#avocado
    (f) http://example.org/eg.xml#anise   

Of these, (a) is a same-document reference and RFC 3986 says it
"should" be dereferenced without a new retrieval action.  If a new
retrieval action is nevertheless launched, the resource retrieved is
(a).

[It is not clear to me whether the XVAI ('xml:base values are
irrelevant') interpretation takes a position on whether any of these
other than (a) are same-document references.]

Interpretation 2 ('same-document references can be surprising', or
SRCBS).  Resolve all relative references against the base URI in the
usual way.  Dereference same-document references either by looking in
the same document (as recommended by RFC 3986) or by launching a new
retrieval operation.

The target attributes are resolved to the absolute forms

    (d) http://www.dictionary.com/a.html#apple
    (b) http://www.dictionary.com/a.html#avocado
    (c) http://www.dictionary.com/a.html#anise

All of these are same-document references, so according to RFC 3986,
they should be dereferenced without a new retrieval action.  If a
retrieval action is nevertheless launched, it will go to URIs (d),
(b), (c) respectively, not (a), (e), (f).  From the fact that (d),
(b), and (c) can be dereferenced without a new retrieval, it follows
(as far as I can tell) that these three resources can also be denoted
by URIs (a), (e), (f).

The XVAI and SRCBS interpretations agree on the following proposition,
which has important relevance for operations on the data:

P1 The relative reference target="#apple" can be dereferenced by
locating the element in the current document with xml:id="apple", if
such an element exists.

For people whose main interest is the truth or falsity of that
proposition, then, the answer is "yes, that's right" -- the effect is
the same, and all else is just pilpul.

From P1, it follows (I think) that

P2 The resource identified by target="#apple" is identified by the
absolute URI http://example.org/eg.xml#apple ((a) above).

The two interpretations disagree, or seem to disagree, on a number of
other propositions, most obviously:

P3 The relative reference '#apple' does not identify the resource
identifed by http://www.dictionary.com/a.html#apple.

XVAI does not actually entail P2, but it is compatible with P3.  (To
reach P3 it is necessary to assume some rule like "No two URIs
identify the same thing" or "If we don't know that a URI identifies a
thing, then it does not identify that thing.")  SRCBS entails the
negation of P3.

They also prescribe different URIs for the case that software
determines to perform a fresh retrieval action for the relative
reference #apple: XVAI prescribes the absolute URI (a), SRCBS
prescribes URI (d).

I don’t believe anyone has seriously suggested XVAI as the
relevant rule of interpretation for examples like the one given;
what I have suggested (and I have understood Hugh Cayless
to be agreeing with) is SRCBS.  Operationally, they can have
similar results in some circumstances (specifically:  they can
both result in no new retrieval action being undertaken in order
to dereference ‘#apple’), but they differ in ways which can be
critical.


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************

------------------------------

Date:    Thu, 4 May 2017 14:28:37 -0400
From:    Hugh Cayless <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

That's a rather favorable interpretation on your part. One person agrees
with you without elucidating, one says this discussion has jumped the shark
(which is fair), and the third (Michael Kay) gives a fuller answer which
adds up to "it depends". Michael Kay is quite correct that in the context
where a document retrieval is *expected* to occur, the URI would indeed be
computed with reference to its base and fetched.

The thing is, I'm not aware of any TEI attributes or element/attribute
combinations which are defined as *forcing* a retrieval action. I'd be
happy to be corrected if I'm missing any, of course.

It's fair to ask not just how one might expect them to behave, but what
same-document references *mean* in the context of TEI documents with
@xml:base. I agree this is something we ought to make clear. I think there
is some possibility of wiggle room, given that TEI has its own media type.
But I also think that we'd be better off adhering to the letter of RFC
3986. The use of same-document references in TEI documents is ubiquitous,
and I'm firmly against anything that might break them.

For what it's worth, modern web browsers seem to agree with your
interpretation (mutatis mutandis—HTML base is not @xml:base). As far as I
can tell, probably because of a desire on the part of the Mozilla
developers back in the day to maintain compatibility with IE 4(!).[1]

To further complicate matters, the author of RFC 3986, Roy Fielding, has
said that using @xml:base in the way you propose, i.e. to enable shorthand
references rather than to set a canonical URI for the current document, is
abusive.[2]

Given all this, I still agree with Michael Sperberg-McQueen's fuller
explication of the issues at hand. I would interpret same-document
references as pointing to the current document, with the caveat that there
might be, now or in the future, certain pointer attributes or
element/attribute combinations that mandate a retrieval action. Such a
retrieval would necessarily use whatever base was defined for the URI in
question.

References:
1.
http://w3future.com/weblog/2005/01/13.xml#stillBugsInTheImplementationOfHtmlHyperlinks
2. http://w3future.com/weblog/2005/08/14.xml#howToUseBaseUris

On Thu, May 4, 2017 at 10:16 AM, John P. McCaskey <[hidden email]>
wrote:

> I asked about this over at XML-DEV, http://lists.xml.org/archives/
> xml-dev/201705/msg00008.html.
>
> Opinion there is with me and opposite the majority here.
>
> Readers there don’t seem to think there is anything to debate. I got one
> short answer and one dismissive comment about ”questions that simple.”
> Someone take a look and be sure I didn’t word my question unfairly.
> Whichever interpretation TEI adopts, sounds like it should be documented
> in the Guidelines or a note somewhere.
>
> If the McCaskey/XML-DEV interpretation is adopted, the GitHub issue I
> posted about @rendition stands. If not, that issue goes away.
>
> John
>
>
>
> On 5/3/2017 8:03 PM, Hugh Cayless wrote:
>
> Those are the cards we've been dealt, yeah.
>
> Sent from my phone.
>
> On May 3, 2017, at 19:53, John P. McCaskey <[hidden email]>
> wrote:
>
> So, bottom line: A standalone fragment identifier refers to the loaded
> document and all the xml:base values above it in the hierarchy are
> irrelevant. Encoders cannot use xml:base to direct a standalone #fragment
> value to a location outside the loaded document.
>
> Is that right?
>
> --
>
>
> On 5/3/2017 6:20 PM, Hugh Cayless wrote:
>
> No. This clarifies the expected behavior when (e.g.) you have @xml:base="#frag".
> It says nothing whatever about <p rendition="#foo">. I don't think these
> specifications interact in the way you're positing. Quite the opposite.
>
> On Wed, May 3, 2017 at 6:13 PM, John P. McCaskey <[hidden email]
> > wrote:
>
>> The special behavior of same-document references in RFC 3986 is disavowed
>> by W3C:
>>
>> 4.4 Interpretation of same-document references
>>
>> RFC 3986 defines certain relative URI references, in particular the empty
>> string and those of the form #fragment, as same-document references.
>> Dereferencing of same-document references is handled specially. However,
>> their use as the value of an *xml:base attribute does not* *involve *dereferencing,
>> and XML Base processors should resolve them in the usual way. In
>> particular, xml:base="" does not reset the base URI to that of the
>> containing document.
>>
>> Note:
>>
>> Some existing processors do treat these xml:base values as resetting the
>> base URI to that of the containing document, so the use of such values is
>> strongly discouraged.
>>
>> This says:
>>
>> RFC 3986 defines special “dereferencing” of empty strings and #fragments.
>> But over here in XML-land, we don’t do “dereferencing.“ That’s not a word
>> we use here. Ignore that stuff about same-document references. Just resolve
>> empty strings and #fragments as specified in the W3C Recommendation above.
>> And those of you who did carry that stuff over from 3986 to XML-land, shame
>> on you. You messed things up for the rest of us.
>>
>>
>> A note has:
>>
>> 5. The meanings of xml:base="" and xml:base="#frag" have been clarified;
>>
>> This says:
>>
>> In this second version of this recommendation, we added paragraph 4.4
>> specifically to get you 3986 people to stop polluting our W3C with your
>> special cases. Stop doing that.
>>
>> No?
>>
>> -- John
>>
>>
>>
>>
>>
>>
>> On 5/3/2017 5:09 PM, Hugh Cayless wrote:
>>
>> That's exactly what it does. It's just that the behavior of same-document
>> references is prescribed in such a way that they end up resolving to the
>> current document regardless of the value of the @xml:base.
>>
>> Put another way, @xml:base has an influence on what the client
>> application will retrieve when it dereferences a URI and retrieves the
>> referenced document, but in the case of same-document references no such
>> retrieval is expected to occur—it's assumed the client already has the
>> document.
>>
>> On Wed, May 3, 2017 at 4:29 PM, John P. McCaskey <
>> [hidden email]> wrote:
>>
>>> Oh. I thought it obvious they all resolved the same.
>>>
>>> I think the same about this one. You think otherwise?
>>>
>>> <div xml:base="http://www.myteiproject.com/" <http://www.myteiproject.com/>
>>>   <div xml:base="images/">
>>>     <div>
>>>       <graphic url="logo.jpg">
>>>     </div>
>>>     <div xml:base="http://www.dictionary.com/words/">
>>>       <p xml:base="a.html">
>>>         <ref target="#apple">apple</ref>
>>>       </p>
>>>     </div>
>>>   </div>
>>> </div>
>>>
>>> I didn’t think xml:base had any necessary relation to the current
>>> document. I thought in XML (not HTML), it just sets a path for URIs below
>>> it.
>>>
>>> --
>>> On 5/3/2017 3:42 PM, Hugh Cayless wrote:
>>>
>>> Well, we don't know about the third one, except that it points to
>>> whatever element *in the same document* has the @xml:id "apple". When
>>> an application attempts to dereference it, it should make the URI in @ref
>>> absolute and check it against the base (http://www.dictionary.com/wor
>>> ds/a.html), discover they are identical, decide it doesn't need to
>>> fetch anything, and go looking in the current document for the element with
>>> the id "apple".
>>> On Wed, May 3, 2017 at 3:27 PM, John P. McCaskey <
>>> [hidden email]> wrote:
>>>>
>>>> Are people proposing that these targets do not all resolve
>>>> to the same http://www.dictionary.com/words/a.html#apple?
>>>> <div xml:base="http://www.dictionary.com/" <http://www.dictionary.com/>>
>>>>     <p xml:base="words/">
>>>>         <ref target="a.html#apple">apple</ref>
>>>>     </p>
>>>> </div>
>>>>
>>>> <div xml:base="http://www.dictionary.com/words/" <http://www.dictionary.com/words/>>
>>>>     <p xml:base="a.html">
>>>>         <ref target="#apple">apple</ref>
>>>>     </p>
>>>> </div>
>>>>
>>>> <div xml:base="http://www.dictionary.com/words/a.html" <http://www.dictionary.com/words/a.html>>
>>>>     <p>
>>>>         <ref target="#apple">apple</ref>
>>>>     </p>
>>>> </div>
>>>>
>>>> <div xml:base="http://www.dictionary.com/" <http://www.dictionary.com/>>
>>>>     <p>
>>>>         <ref target="words/a.html#apple">apple</ref>
>>>>     </p>
>>>> </div>
>>>> --
>>>>
>>>>

------------------------------

Date:    Thu, 4 May 2017 15:31:42 -0400
From:    "John P. McCaskey" <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

What is the short bottom-line guidance for someone trying to encode a
document? Is it this?

    To point inside a TEI document, as for @rendition, use pointers of
    the form #destination. Any later changes to xml:base values will not
    mess up your internal # pointers.

    Don’t try to point outside your TEI document by using xml:base plus
    a #destination in the pointer attribute. No xml:base attribute will
    be prepended to a pointer that begins with a #.

Even if I and others find that idiosyncratic and surprising, it’s
unambiguous, has practical benefits, requires no knowledge of RFCs or
W3C specs, is easy to articulate, and is easy to encode to.

John



On 5/4/2017 2:28 PM, Hugh Cayless wrote:
> That's a rather favorable interpretation on your part. One person
> agrees with you without elucidating, one says this discussion has
> jumped the shark (which is fair), and the third (Michael Kay) gives a
> fuller answer which adds up to "it depends". Michael Kay is quite
> correct that in the context where a document retrieval is *expected*
> to occur, the URI would indeed be computed with reference to its base
> and fetched.
>
> The thing is, I'm not aware of any TEI attributes or element/attribute
> combinations which are defined as *forcing* a retrieval action. I'd be
> happy to be corrected if I'm missing any, of course.
>
> It's fair to ask not just how one might expect them to behave, but
> what same-document references *mean* in the context of TEI documents
> with @xml:base. I agree this is something we ought to make clear. I
> think there is some possibility of wiggle room, given that TEI has its
> own media type. But I also think that we'd be better off adhering to
> the letter of RFC 3986. The use of same-document references in TEI
> documents is ubiquitous, and I'm firmly against anything that might
> break them.
>
> For what it's worth, modern web browsers seem to agree with your
> interpretation (mutatis mutandis—HTML base is not @xml:base). As far
> as I can tell, probably because of a desire on the part of the Mozilla
> developers back in the day to maintain compatibility with IE 4(!).[1]
>
> To further complicate matters, the author of RFC 3986, Roy Fielding,
> has said that using @xml:base in the way you propose, i.e. to enable
> shorthand references rather than to set a canonical URI for the
> current document, is abusive.[2]
>
> Given all this, I still agree with Michael Sperberg-McQueen's fuller
> explication of the issues at hand. I would interpret same-document
> references as pointing to the current document, with the caveat that
> there might be, now or in the future, certain pointer attributes or
> element/attribute combinations that mandate a retrieval action. Such a
> retrieval would necessarily use whatever base was defined for the URI
> in question.
>
> References:
> 1.
> http://w3future.com/weblog/2005/01/13.xml#stillBugsInTheImplementationOfHtmlHyperlinks
> 2. http://w3future.com/weblog/2005/08/14.xml#howToUseBaseUris
>
> On Thu, May 4, 2017 at 10:16 AM, John P. McCaskey
> <[hidden email] <[hidden email]>> wrote:
>
>     I asked about this over at XML-DEV,
>     http://lists.xml.org/archives/xml-dev/201705/msg00008.html
>     <http://lists.xml.org/archives/xml-dev/201705/msg00008.html>.
>
>     Opinion there is with me and opposite the majority here.
>
>     Readers there don’t seem to think there is anything to debate. I
>     got one short answer and one dismissive comment about ”questions
>     that simple.” Someone take a look and be sure I didn’t word my
>     question unfairly.
>
>     Whichever interpretation TEI adopts, sounds like it should be
>     documented in the Guidelines or a note somewhere.
>
>     If the McCaskey/XML-DEV interpretation is adopted, the GitHub
>     issue I posted about @rendition stands. If not, that issue goes away.
>
>     John
>
>
>
>     On 5/3/2017 8:03 PM, Hugh Cayless wrote:
>>     Those are the cards we've been dealt, yeah.
>>
>>     Sent from my phone.
>>
>>     On May 3, 2017, at 19:53, John P. McCaskey
>>     <[hidden email] <[hidden email]>> wrote:
>>
>>>     So, bottom line: A standalone fragment identifier refers to the
>>>     loaded document and all the xml:base values above it in the
>>>     hierarchy are irrelevant. Encoders cannot use xml:base to direct
>>>     a standalone #fragment value to a location outside the loaded
>>>     document.
>>>
>>>     Is that right?
>>>
>>>     --
>>>
>>>
>>>     On 5/3/2017 6:20 PM, Hugh Cayless wrote:
>>>>     No. This clarifies the expected behavior when (e.g.) you have
>>>>     @xml:base="#frag". It says nothing whatever about <p
>>>>     rendition="#foo">. I don't think these specifications interact
>>>>     in the way you're positing. Quite the opposite.
>>>>
>>>>     On Wed, May 3, 2017 at 6:13 PM, John P. McCaskey
>>>>     <[hidden email] <[hidden email]>> wrote:
>>>>
>>>>         The special behavior of same-document references in RFC
>>>>         3986 is disavowed by W3C:
>>>>
>>>>             4.4 Interpretation of same-document references
>>>>
>>>>             RFC 3986 defines certain relative URI references, in
>>>>             particular the empty string and those of the form
>>>>             #fragment, as same-document references. Dereferencing
>>>>             of same-document references is handled specially.
>>>>             However, their use as the value of an *xml:base
>>>>             attribute does not* *involve *dereferencing, and XML
>>>>             Base processors should resolve them in the usual way.
>>>>             In particular, xml:base="" does not reset the base URI
>>>>             to that of the containing document.
>>>>
>>>>             Note:
>>>>
>>>>             Some existing processors do treat these xml:base values
>>>>             as resetting the base URI to that of the containing
>>>>             document, so the use of such values is strongly
>>>>             discouraged.
>>>>
>>>>         This says:
>>>>
>>>>             RFC 3986 defines special “dereferencing” of empty
>>>>             strings and #fragments. But over here in XML-land, we
>>>>             don’t do “dereferencing.“ That’s not a word we use
>>>>             here. Ignore that stuff about same-document references.
>>>>             Just resolve empty strings and #fragments as specified
>>>>             in the W3C Recommendation above. And those of you who
>>>>             did carry that stuff over from 3986 to XML-land, shame
>>>>             on you. You messed things up for the rest of us.
>>>>
>>>>
>>>>         A note has:
>>>>
>>>>             5. The meanings of xml:base="" and xml:base="#frag"
>>>>             have been clarified;
>>>>
>>>>         This says:
>>>>
>>>>             In this second version of this recommendation, we added
>>>>             paragraph 4.4 specifically to get you 3986 people to
>>>>             stop polluting our W3C with your special cases. Stop
>>>>             doing that.
>>>>
>>>>         No?
>>>>
>>>>         -- John
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>         On 5/3/2017 5:09 PM, Hugh Cayless wrote:
>>>>>         That's exactly what it does. It's just that the behavior
>>>>>         of same-document references is prescribed in such a way
>>>>>         that they end up resolving to the current document
>>>>>         regardless of the value of the @xml:base.
>>>>>
>>>>>         Put another way, @xml:base has an influence on what the
>>>>>         client application will retrieve when it dereferences a
>>>>>         URI and retrieves the referenced document, but in the case
>>>>>         of same-document references no such retrieval is expected
>>>>>         to occur—it's assumed the client already has the document.
>>>>>
>>>>>         On Wed, May 3, 2017 at 4:29 PM, John P. McCaskey
>>>>>         <[hidden email]
>>>>>         <[hidden email]>> wrote:
>>>>>
>>>>>             Oh. I thought it obvious they all resolved the same.
>>>>>
>>>>>             I think the same about this one. You think otherwise?
>>>>>
>>>>>             <div xml:base="http://www.myteiproject.com/"
>>>>>             <http://www.myteiproject.com/> <div
>>>>>             xml:base="images/"> <div> <graphic url="logo.jpg">
>>>>>             </div> <div xml:base="http://www.dictionary.com/words/
>>>>>             <http://dictionary.com/words/>"> <p xml:base="a.html">
>>>>>             <ref target="#apple">apple</ref> </p> </div> </div>
>>>>>             </div>
>>>>>
>>>>>             I didn’t think xml:base had any necessary relation to
>>>>>             the current document. I thought in XML (not HTML), it
>>>>>             just sets a path for URIs below it.
>>>>>
>>>>>             --
>>>>>
>>>>>             On 5/3/2017 3:42 PM, Hugh Cayless wrote:
>>>>>>             Well, we don't know about the third one, except that
>>>>>>             it points to whatever element *in the same document*
>>>>>>             has the @xml:id "apple". When an application attempts
>>>>>>             to dereference it, it should make the URI in @ref
>>>>>>             absolute and check it against the base
>>>>>>             (http://www.dictionary.com/words/a.html
>>>>>>             <http://www.dictionary.com/words/a.html>), discover
>>>>>>             they are identical, decide it doesn't need to fetch
>>>>>>             anything, and go looking in the current document for
>>>>>>             the element with the id "apple".
>>>>>>             On Wed, May 3, 2017 at 3:27 PM, John P. McCaskey
>>>>>>             <[hidden email]
>>>>>>             <[hidden email]>> wrote:
>>>>>>
>>>>>>                 Are people proposing that these targets do not
>>>>>>                 all resolve to the same
>>>>>>                 http://www.dictionary.com/words/a.html#apple
>>>>>>                 <http://www.dictionary.com/words/a.html#apple>?
>>>>>>                 <div xml:base="http://www.dictionary.com/"
>>>>>>                 <http://www.dictionary.com/>> <p
>>>>>>                 xml:base="words/"> <ref
>>>>>>                 target="a.html#apple">apple</ref> </p> </div>
>>>>>>                 <div xml:base="http://www.dictionary.com/words/"
>>>>>>                 <http://www.dictionary.com/words/>> <p
>>>>>>                 xml:base="a.html"> <ref
>>>>>>                 target="#apple">apple</ref> </p> </div> <div
>>>>>>                 xml:base="http://www.dictionary.com/words/a.html"
>>>>>>                 <http://www.dictionary.com/words/a.html>> <p>
>>>>>>                 <ref target="#apple">apple</ref> </p> </div> <div
>>>>>>                 xml:base="http://www.dictionary.com/"
>>>>>>                 <http://www.dictionary.com/>> <p> <ref
>>>>>>                 target="words/a.html#apple">apple</ref> </p>
>>>>>>                 </div>--
>>>>>>

------------------------------

Date:    Thu, 4 May 2017 14:32:37 -0600
From:    "C. M. Sperberg-McQueen" <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

> On May 4, 2017, at 12:28 PM, Hugh Cayless <[hidden email]> wrote:
>
> That's a rather favorable interpretation on your part. One person agrees with you without elucidating,

I’m not sure this is true.  What Eliot Kimber said is that in the context given, ‘#apple’ identifies the same thing as ‘http://www.dictionary.com/a.html#apple'.  Does that distinguish between the two (? are there just two?) — does that distinguish among the various interpretations of the situation offered so far?

I don’t think so.  I think the disagreement we have is not over the statement  affirmed by Eliot Kimber, but over the following two claims:

  C1 In the context described, ‘#apple’ is a same-document reference and
  can therefore by definition be dereferenced without a new retrieval action.

  C2 In the context described, ‘#apple’ does not refer to the element in the
  current document with xml:id=“apple” (if any); it  cannot be dereferenced
  without a new retrieval action.

I intend C1 as a representation of the interpretation of 3986 I’ve been offering, and C2 as a representation of the interpretation offered by John McCaskey. (SRCBS and XVAI, in my note of earlier today.

> one says this discussion has jumped the shark (which is fair), and the third (Michael Kay) gives a fuller answer which adds up to "it depends". Michael Kay is quite correct that in the context where a document retrieval is *expected* to occur, the URI would indeed be computed with reference to its base and fetched.

The URI is *always* computed with reference to its base. 

Optimizations which produce the same result are, of course, allowed.  The preceding paragraph is a claim about the meaning of certain language constructs, not a claim about what the CPU and network controller do during evaluation of an expression by a conforming processor.

If it is then determined to be a same-document URI, the resource identified by that URI is then “defined to be within” the current document; in consequence no new retrieval is necessary and a new retrieval should be avoided.  The 'should' here means that 3986 recommends that new retrievals be avoided, but does not forbid new retrievals; if conforming processors or specs have good reason for launching new retrievals, that's not a violation of the rules of 3986.  The XSLT spec defines the document() function as always launching a new retrieval.  (Note that this does not amount to any claim by the XSLT spec that the relevant resource is not within the current document.)

>
> The thing is, I'm not aware of any TEI attributes or element/attribute combinations which are defined as *forcing* a retrieval action. I'd be happy to be corrected if I'm missing any, of course.
>
> It's fair to ask not just how one might expect them to behave, but what same-document references *mean* in the context of TEI documents with @xml:base. I agree this is something we ought to make clear. I think there is some possibility of wiggle room, given that TEI has its own media type. But I also think that we'd be better off adhering to the letter of RFC 3986. The use of same-document references in TEI documents is ubiquitous, and I'm firmly against anything that might break them.

Is it clear what counts as breakage, here?

If a given interpretation of the URI specs causes some URI references to break (by which I mean: to have an interpretation different from what the encoders intended), can we be confident that a contrary interpretation will not break any?  Or is it the case that one interpretation will break some URI references, and a different interpretation will break others?

When there are two possible interpreations of a given rule in a spec, it’s seldom the case that everyone interprets it the same way.  There is some risk that your choice does not lie between breaking things in TEI documents and not breaking them, but between breaking those belonging to one project and breaking those belonging to another project.

> For what it's worth, modern web browsers seem to agree with your interpretation (mutatis mutandis—HTML base is not @xml:base). As far as I can tell, probably because of a desire on the part of the Mozilla developers back in the day to maintain compatibility with IE 4(!).[1]

Can you expound? 

Do you mean that in an HTML 4.01 or XHTML 1.0 document at http://example.com/doc.html containing <base href=“http://example.org/base.html”>, a link of the form <a href=“#quovadis”>where does this connect to?</a> will go to … where? 

I think the two obvious things one might expect to see in the address bar after traversing that link are

    http://example.com/doc.html#quovadis
    http://example.org/base.html#quovaids

In the first case, this will happen either after a document refresh or without a refresh (just scrolling to the place).

If a browser shows one of these addresses, does that constitute a claim that the other identifies a different resource?  Or does it only constitute a decision on the part of the browser about which of the two possible paths to a given resource it should take?


> To further complicate matters, the author of RFC 3986, Roy Fielding, has said that using @xml:base in the way you propose, i.e. to enable shorthand references rather than to set a canonical URI for the current document, is abusive.[2]

Thank you for that reference.

I think the rules in 3986 make perfect sense if one assumes Roy Fielding’s principle that the base URI within a document should be the base URI of the document, and that using xml:base or html:base to shorten references which would otherwise be long is not a scenario worth bending over backwards for.  (That is, it seems to me to simplify matters, rather than complicating them.)

If one doesn’t want to make that assumption, some relatively simple rules like the following might go some distance towards reducing the likelihood of unpleasant surprises:

  - Use the form “#fragment” only for references to locations in the current document.  These will always be same-document references within the meaning of RFC 3986.
  - Use xml:base to set all but the last bit of the URI, but not to set a full document URI:  xml:base=“http://dictionary.example.org/entries/“ with relative references to “a.html#apple” and “a.html#anodyne” will be better than xml:base=“http://dictionary.example.org/entries/a.html” with relative referencs to “#apple” and “#anodyne”.  These will never be taken to be same-document references.

Note, however, that while “#fragment” will always be a same-document reference, it will ALSO always be a reference to the given fragment in the resource identified by the base URI.  If that’s not logically the same as the resource within which the reference occurs, you’re playing with fire:  RFC 3986 says “#fragment” “should” be dereferenced without a new retrieval, not that it must be.  Any software will be perfectly within its rights to retrieve the base URI and look for the fragment there.
 
>
> References:
> 1. http://w3future.com/weblog/2005/01/13.xml#stillBugsInTheImplementationOfHtmlHyperlinks
> 2. http://w3future.com/weblog/2005/08/14.xml#howToUseBaseUris


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************

------------------------------

Date:    Thu, 4 May 2017 14:42:11 -0600
From:    "C. M. Sperberg-McQueen" <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

> On May 4, 2017, at 1:31 PM, John P. McCaskey <[hidden email]> wrote:
>
> What is the short bottom-line guidance for someone trying to encode a document? Is it this?
>
> To point inside a TEI document, as for @rendition, use pointers of the form #destination. Any later changes to xml:base values will not mess up your internal # pointers.
>
> Don’t try to point outside your TEI document by using xml:base plus a #destination in the pointer attribute. No xml:base attribute will be prepended to a pointer that begins with a #.

No.   Sorry.  That’s not the way it works, and I think this discussion has already demonstrated that it’s a dangerous way to describe the behavior. 

If you want to keep things simple for the encoders, I’d rephrase this as something like:  Don’t try to point outside your TEI document by using xml:base plus a #destination in the pointer attribute.  It does not have the desired meaning.  Point outside your TEI document either using an absolute URI or an xml:base attribute plus the final part of the path (the ‘file name’ part) and the fragment identifier.  So NOT <ptr xml:base=“lib/foo.xml” target=“#bar”/> but <ptr xml:base=“lib/” target=“foo.xml#bar”/>.


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************

------------------------------

Date:    Thu, 4 May 2017 20:44:29 +0000
From:    Martin Mueller <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

I’m following this thread from a distance (much of it is way over my head), but it reminds me of something I’ve said before. Every few months or so there is an extended discussion on the TEI list that raises a tough issue.  It would be helpful to the members at large if the Council assumed the responsibility for digesting such a discussion into a one or two-page summary.  There might a special place on the site for such papers. Sometimes these discussion reach agreement, sometimes they just help clarify positions.

The obvious objection to my suggestion is that the members of the Council have other and more pressing things to do. On the other hand, documentation is clearly a responsibility of the Council, and position papers of this type are a kind of documentation. Or one could say “Why don’t the readers of the list take or make the time to read the extended discussion?” A good question, but there is a very high time cost involved in tracking a discussion that has half a dozen participants and dozens of entries.

I can’t be the only follower of this list who wouldn’t be grateful for a succinct account of “what was all this about?”

On 5/4/17, 3:32 PM, "TEI (Text Encoding Initiative) public discussion list on behalf of C. M. Sperberg-McQueen" <[hidden email] on behalf of [hidden email]> wrote:

    > On May 4, 2017, at 12:28 PM, Hugh Cayless <[hidden email]> wrote:
    >
    > That's a rather favorable interpretation on your part. One person agrees with you without elucidating,
   
    I’m not sure this is true.  What Eliot Kimber said is that in the context given, ‘#apple’ identifies the same thing as ‘https://urldefense.proofpoint.com/v2/url?u=http-3A__www.dictionary.com_a.html-23apple&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=OlhWZ8ch2cUtq8qgCtAvoznqSamkaD9OBrjp2Js-eJs&e= '.  Does that distinguish between the two (? are there just two?) — does that distinguish among the various interpretations of the situation offered so far?
   
    I don’t think so.  I think the disagreement we have is not over the statement  affirmed by Eliot Kimber, but over the following two claims:
   
      C1 In the context described, ‘#apple’ is a same-document reference and
      can therefore by definition be dereferenced without a new retrieval action.
   
      C2 In the context described, ‘#apple’ does not refer to the element in the
      current document with xml:id=“apple” (if any); it  cannot be dereferenced
      without a new retrieval action.
   
    I intend C1 as a representation of the interpretation of 3986 I’ve been offering, and C2 as a representation of the interpretation offered by John McCaskey. (SRCBS and XVAI, in my note of earlier today.
   
    > one says this discussion has jumped the shark (which is fair), and the third (Michael Kay) gives a fuller answer which adds up to "it depends". Michael Kay is quite correct that in the context where a document retrieval is *expected* to occur, the URI would indeed be computed with reference to its base and fetched.
   
    The URI is *always* computed with reference to its base. 
   
    Optimizations which produce the same result are, of course, allowed.  The preceding paragraph is a claim about the meaning of certain language constructs, not a claim about what the CPU and network controller do during evaluation of an expression by a conforming processor.
   
    If it is then determined to be a same-document URI, the resource identified by that URI is then “defined to be within” the current document; in consequence no new retrieval is necessary and a new retrieval should be avoided.  The 'should' here means that 3986 recommends that new retrievals be avoided, but does not forbid new retrievals; if conforming processors or specs have good reason for launching new retrievals, that's not a violation of the rules of 3986.  The XSLT spec defines the document() function as always launching a new retrieval.  (Note that this does not amount to any claim by the XSLT spec that the relevant resource is not within the current document.)
   
    >
    > The thing is, I'm not aware of any TEI attributes or element/attribute combinations which are defined as *forcing* a retrieval action. I'd be happy to be corrected if I'm missing any, of course.
    >
    > It's fair to ask not just how one might expect them to behave, but what same-document references *mean* in the context of TEI documents with @xml:base. I agree this is something we ought to make clear. I think there is some possibility of wiggle room, given that TEI has its own media type. But I also think that we'd be better off adhering to the letter of RFC 3986. The use of same-document references in TEI documents is ubiquitous, and I'm firmly against anything that might break them.
   
    Is it clear what counts as breakage, here?
   
    If a given interpretation of the URI specs causes some URI references to break (by which I mean: to have an interpretation different from what the encoders intended), can we be confident that a contrary interpretation will not break any?  Or is it the case that one interpretation will break some URI references, and a different interpretation will break others?
   
    When there are two possible interpreations of a given rule in a spec, it’s seldom the case that everyone interprets it the same way.  There is some risk that your choice does not lie between breaking things in TEI documents and not breaking them, but between breaking those belonging to one project and breaking those belonging to another project.
   
    > For what it's worth, modern web browsers seem to agree with your interpretation (mutatis mutandis—HTML base is not @xml:base). As far as I can tell, probably because of a desire on the part of the Mozilla developers back in the day to maintain compatibility with IE 4(!).[1]
   
    Can you expound? 
   
    Do you mean that in an HTML 4.01 or XHTML 1.0 document at https://urldefense.proofpoint.com/v2/url?u=http-3A__example.com_doc.html&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=29y8v-TK44tIIjTyqctxB4Lf9WPymQrVdzrLIvEW6qI&e=  containing <base href=“https://urldefense.proofpoint.com/v2/url?u=http-3A__example.org_base.html-25EF-25BF-25BD&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=lknBDjsTtVRI870bNMIn9dPdWUvd3whJSBhji92091Q&e= ��>, a link of the form <a href=“#quovadis”>where does this connect to?</a> will go to … where? 
   
    I think the two obvious things one might expect to see in the address bar after traversing that link are
   
        https://urldefense.proofpoint.com/v2/url?u=http-3A__example.com_doc.html-23quovadis&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=1hCsliHZFQuB2OctN_76mPFbm9rUqdBn1A2SVJccTFc&e=
        https://urldefense.proofpoint.com/v2/url?u=http-3A__example.org_base.html-23quovaids&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=COMDCJ41VanHpbGX9WMOSGuIRx38Szt4ldxlhdtB8vs&e=
   
    In the first case, this will happen either after a document refresh or without a refresh (just scrolling to the place).
   
    If a browser shows one of these addresses, does that constitute a claim that the other identifies a different resource?  Or does it only constitute a decision on the part of the browser about which of the two possible paths to a given resource it should take?
   
   
    > To further complicate matters, the author of RFC 3986, Roy Fielding, has said that using @xml:base in the way you propose, i.e. to enable shorthand references rather than to set a canonical URI for the current document, is abusive.[2]
   
    Thank you for that reference.
   
    I think the rules in 3986 make perfect sense if one assumes Roy Fielding’s principle that the base URI within a document should be the base URI of the document, and that using xml:base or html:base to shorten references which would otherwise be long is not a scenario worth bending over backwards for.  (That is, it seems to me to simplify matters, rather than complicating them.)
   
    If one doesn’t want to make that assumption, some relatively simple rules like the following might go some distance towards reducing the likelihood of unpleasant surprises:
   
      - Use the form “#fragment” only for references to locations in the current document.  These will always be same-document references within the meaning of RFC 3986.
      - Use xml:base to set all but the last bit of the URI, but not to set a full document URI:  xml:base=“https://urldefense.proofpoint.com/v2/url?u=http-3A__dictionary.example.org_entries_&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=uiFZhf2CmFnQIHzkLrsBKwvBnPTf2AtYINKYtwKyWMI&e= “ with relative references to “a.html#apple” and “a.html#anodyne” will be better than xml:base=“https://urldefense.proofpoint.com/v2/url?u=http-3A__dictionary.example.org_entries_a.html&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=pNUfEVwlyFPDAjVMWdE4ChwxmVlqytmyd6fi54Odw5A&e= ” with relative referencs to “#apple” and “#anodyne”.  These will never be taken to be same-document references.
   
    Note, however, that while “#fragment” will always be a same-document reference, it will ALSO always be a reference to the given fragment in the resource identified by the base URI.  If that’s not logically the same as the resource within which the reference occurs, you’re playing with fire:  RFC 3986 says “#fragment” “should” be dereferenced without a new retrieval, not that it must be.  Any software will be perfectly within its rights to retrieve the base URI and look for the fragment there.
    
    >
    > References:
    > 1. https://urldefense.proofpoint.com/v2/url?u=http-3A__w3future.com_weblog_2005_01_13.xml-23stillBugsInTheImplementationOfHtmlHyperlinks&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=EOYAXmorJnWNneuoby45ZricHwP8Ll9_ub7gKR2Ck-4&e=
    > 2. https://urldefense.proofpoint.com/v2/url?u=http-3A__w3future.com_weblog_2005_08_14.xml-23howToUseBaseUris&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=pUcTcRc_mAHAy8r8nPCz8cD8nuQBDOqDvJ-37BCwJJQ&e= 
   
   
    ********************************************
    C. M. Sperberg-McQueen
    Black Mesa Technologies LLC
    [hidden email]
    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.blackmesatech.com&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=ceMDgmbvrnWoyJVZ8VidJIp4udgcdq0tzNL8qVw_itM&e=
    ********************************************
   

------------------------------

Date:    Thu, 4 May 2017 17:07:00 -0400
From:    Hugh Cayless <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

On Thu, May 4, 2017 at 4:32 PM, C. M. Sperberg-McQueen <
[hidden email]> wrote:

>
>
> > one says this discussion has jumped the shark (which is fair), and the
> third (Michael Kay) gives a fuller answer which adds up to "it depends".
> Michael Kay is quite correct that in the context where a document retrieval
> is *expected* to occur, the URI would indeed be computed with reference to
> its base and fetched.
>
> The URI is *always* computed with reference to its base.
>

Right. It's the "and fetched" part that's crucial. In a context where
retrieval is forced, if the base is different from the document URI you'd
expect to end up somewhere else.

>
> Optimizations which produce the same result are, of course, allowed.  The
> preceding paragraph is a claim about the meaning of certain language
> constructs, not a claim about what the CPU and network controller do during
> evaluation of an expression by a conforming processor.
>
> If it is then determined to be a same-document URI, the resource
> identified by that URI is then “defined to be within” the current document;
> in consequence no new retrieval is necessary and a new retrieval should be
> avoided.  The 'should' here means that 3986 recommends that new retrievals
> be avoided, but does not forbid new retrievals; if conforming processors or
> specs have good reason for launching new retrievals, that's not a violation
> of the rules of 3986.  The XSLT spec defines the document() function as
> always launching a new retrieval.  (Note that this does not amount to any
> claim by the XSLT spec that the relevant resource is not within the current
> document.)
>
> >
> > The thing is, I'm not aware of any TEI attributes or element/attribute
> combinations which are defined as *forcing* a retrieval action. I'd be
> happy to be corrected if I'm missing any, of course.
> >
> > It's fair to ask not just how one might expect them to behave, but what
> same-document references *mean* in the context of TEI documents with
> @xml:base. I agree this is something we ought to make clear. I think there
> is some possibility of wiggle room, given that TEI has its own media type.
> But I also think that we'd be better off adhering to the letter of RFC
> 3986. The use of same-document references in TEI documents is ubiquitous,
> and I'm firmly against anything that might break them.
>
> Is it clear what counts as breakage, here?
>

Well, from my selfish perspective, "breakage" mainly means we've got to go
all over the Guidelines and add notes to the effect that '#fragment'
pointers may behave differently if @xml:base is set. To an extent, wanting
to avoid this is sheer laziness on my part. But I'm also convinced that
<tei:ref target="#foo"> in P5 is intended to mean the same thing that <ref
target="foo"> did in P4, when the value of @target was IDREFS rather than
teidata.pointer+ (I believe you mentioned this earlier), and that the
introduction of @xml:base was not intended to affect that meaning. I think
we're better off assuming that same-document references are referring to
the document that contains them.


>
> If a given interpretation of the URI specs causes some URI references to
> break (by which I mean: to have an interpretation different from what the
> encoders intended), can we be confident that a contrary interpretation will
> not break any?  Or is it the case that one interpretation will break some
> URI references, and a different interpretation will break others?
>
> When there are two possible interpreations of a given rule in a spec, it’s
> seldom the case that everyone interprets it the same way.  There is some
> risk that your choice does not lie between breaking things in TEI documents
> and not breaking them, but between breaking those belonging to one project
> and breaking those belonging to another project.
>
> > For what it's worth, modern web browsers seem to agree with your
> interpretation (mutatis mutandis—HTML base is not @xml:base). As far as I
> can tell, probably because of a desire on the part of the Mozilla
> developers back in the day to maintain compatibility with IE 4(!).[1]
>
> Can you expound?
>
> Do you mean that in an HTML 4.01 or XHTML 1.0 document at
> http://example.com/doc.html containing <base href=“
> http://example.org/base.html”>, a link of the form <a
> href=“#quovadis”>where does this connect to?</a> will go to … where?
>

In my brief experimentation, Chrome, Firefox, and Safari will all load
http://example.org/base.html#quovadis in this case, rather than scroll to
the element with id="quovadis" in the current page.

>
> I think the two obvious things one might expect to see in the address bar
> after traversing that link are
>
>     http://example.com/doc.html#quovadis
>     http://example.org/base.html#quovaids
>
> In the first case, this will happen either after a document refresh or
> without a refresh (just scrolling to the place).
>
> If a browser shows one of these addresses, does that constitute a claim
> that the other identifies a different resource?  Or does it only constitute
> a decision on the part of the browser about which of the two possible paths
> to a given resource it should take?
>
> I think it means the browser implementers decided to favor stability over
technical correctness.


>
> > To further complicate matters, the author of RFC 3986, Roy Fielding, has
> said that using @xml:base in the way you propose, i.e. to enable shorthand
> references rather than to set a canonical URI for the current document, is
> abusive.[2]
>
> Thank you for that reference.
>
> I think the rules in 3986 make perfect sense if one assumes Roy Fielding’s
> principle that the base URI within a document should be the base URI of the
> document, and that using xml:base or html:base to shorten references which
> would otherwise be long is not a scenario worth bending over backwards
> for.  (That is, it seems to me to simplify matters, rather than
> complicating them.)
>
> If one doesn’t want to make that assumption, some relatively simple rules
> like the following might go some distance towards reducing the likelihood
> of unpleasant surprises:
>
>   - Use the form “#fragment” only for references to locations in the
> current document.  These will always be same-document references within the
> meaning of RFC 3986.
>   - Use xml:base to set all but the last bit of the URI, but not to set a
> full document URI:  xml:base=“http://dictionary.example.org/entries/“
> with relative references to “a.html#apple” and “a.html#anodyne” will be
> better than xml:base=“http://dictionary.example.org/entries/a.html” with
> relative referencs to “#apple” and “#anodyne”.  These will never be taken
> to be same-document references.
>
> Note, however, that while “#fragment” will always be a same-document
> reference, it will ALSO always be a reference to the given fragment in the
> resource identified by the base URI.  If that’s not logically the same as
> the resource within which the reference occurs, you’re playing with fire:
> RFC 3986 says “#fragment” “should” be dereferenced without a new retrieval,
> not that it must be.  Any software will be perfectly within its rights to
> retrieve the base URI and look for the fragment there.
>

I think this is perfectly reasonable. I believe we're justified in saying
what the TEI expects "#fragment" to mean in the context of a TEI document,
but we can't guarantee that some piece of software that processes your
document won't make different decisions than we expect.

>
> >
> > References:
> > 1. http://w3future.com/weblog/2005/01/13.xml#
> stillBugsInTheImplementationOfHtmlHyperlinks
> > 2. http://w3future.com/weblog/2005/08/14.xml#howToUseBaseUris
>
>
> ********************************************
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> [hidden email]
> http://www.blackmesatech.com
> ********************************************
>
>

------------------------------

End of TEI-L Digest - 3 May 2017 to 4 May 2017 (#2017-101)
**********************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TEI-L Digest - 3 May 2017 to 4 May 2017 (#2017-101)

Emmanuelle Morlock
Thank you Matthew and Franz,
my mistake! I should have read the GL more carefully
Thanks a lot for your explanations.
Best,
Emmanuelle


De: MLH [hidden email]
Répondre: MLH [hidden email]
Date: 5 mai 2017 at 11:05:31
À: (Text Encoding Initiative) public discussion list TEI [hidden email], [hidden email] [hidden email]
Sujet:  Re: TEI-L Digest - 3 May 2017 to 4 May 2017 (#2017-101)

Dear Emmanuelle,

My understanding is that the two numbers in @writtenLines (also @ruledLines), is that the first number gives the minimum number of lines per page or column in the codex as a whole, and the second gives the maximum number of lines per page or column in the codex as a whole.

e.g.

columns="2" writtenLines="20 30"

would not mean that column a had 20 lines and column b 30 lines, but that the codex as a whole had 2 columns per page and between 20 and 30 lines per column throughout.

similarly

columns="1" writtenLines="20 30"

would mean a codex in long lines with between 20 and 30 lines per page throughout.

Best wishes,

Matthew




From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of TEI-L automatic digest system <[hidden email]>
Sent: 05 May 2017 04:00
To: [hidden email]
Subject: TEI-L Digest - 3 May 2017 to 4 May 2017 (#2017-101)
 
There are 10 messages totaling 2938 lines in this issue.

Topics of the day:

  1. "May contain: Empty element"
  2. question about attribute @writtenLines (metadata)
  3. @xml:base with @rendition (and maybe other pointers) (8)

----------------------------------------------------------------------

Date:    Thu, 4 May 2017 11:28:58 +0100
From:    Lou Burnard <[hidden email]>
Subject: Re: "May contain: Empty element"

On 03/05/17 16:19, John P. McCaskey wrote:
> When the spec for an element says “May contain: Empty element“ that
> does not actually mean the spec’d element can contain any other
> element as long as the contained one is empty, right? It actually
> means “May contain: No other element,“ right?
>
> Should that be changed?
>
>

The "may contain" part is boiler plate text provided by the stylesheet
in various languages: it might be tricky to change it. The other part is
meant to imply that the element in question cannot contain anything,
i.e. it is empty. Since there is now a proposal in the works (see
https://github.com/TEIC/TEI/issues/1596)  to allow for an empty content

model to be represented in an ODD by using an explicit element called
"<empty/>"   that text maybe needs to be revisited. You might like to
raise a Stylesheets ticket to remind someone to review it.


------------------------------

Date:    Thu, 4 May 2017 12:45:53 +0200
From:    Emmanuelle Morlock <[hidden email]>
Subject: question about attribute @writtenLines (metadata)

Dear list,

I was just wondering why the attribute @writtenLines on layout can contain either one or two numbers (to represent the number of lines of one or two columns), but not more.

My question is mainly out of curiosity, I don’t have any use case, but it just seems slightly odd (the I may assume that in the manuscript world it’s very rare to have more than two columns…) but what if ? How would you do if you had more than two columns ? 

thanks !
Best, 
-- 
Emmanuelle Morlock
IE CNRS - Humanités numériques & TEI (Text Encoding Initiative)
UMR 5189 HISoMA (Histoire et Sources des Mondes antiques) - Lyon
http://www.hisoma.mom.fr/annuaire/morlock-emmanuelle

06 85 84 69 16
@emma_morlock

------------------------------

Date:    Thu, 4 May 2017 10:16:19 -0400
From:    "John P. McCaskey" <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

I asked about this over at XML-DEV,
http://lists.xml.org/archives/xml-dev/201705/msg00008.html.

Opinion there is with me and opposite the majority here.

Readers there don’t seem to think there is anything to debate. I got one
short answer and one dismissive comment about ”questions that simple.”
Someone take a look and be sure I didn’t word my question unfairly.

Whichever interpretation TEI adopts, sounds like it should be documented
in the Guidelines or a note somewhere.

If the McCaskey/XML-DEV interpretation is adopted, the GitHub issue I
posted about @rendition stands. If not, that issue goes away.

John



On 5/3/2017 8:03 PM, Hugh Cayless wrote:
> Those are the cards we've been dealt, yeah.
>
> Sent from my phone.
>
> On May 3, 2017, at 19:53, John P. McCaskey <[hidden email]
> <[hidden email]>> wrote:
>
>> So, bottom line: A standalone fragment identifier refers to the
>> loaded document and all the xml:base values above it in the hierarchy
>> are irrelevant. Encoders cannot use xml:base to direct a standalone
>> #fragment value to a location outside the loaded document.
>>
>> Is that right?
>>
>> --
>>
>>
>> On 5/3/2017 6:20 PM, Hugh Cayless wrote:
>>> No. This clarifies the expected behavior when (e.g.) you have
>>> @xml:base="#frag". It says nothing whatever about <p
>>> rendition="#foo">. I don't think these specifications interact in
>>> the way you're positing. Quite the opposite.
>>>
>>> On Wed, May 3, 2017 at 6:13 PM, John P. McCaskey
>>> <[hidden email] <[hidden email]>> wrote:
>>>
>>>     The special behavior of same-document references in RFC 3986 is
>>>     disavowed by W3C:
>>>
>>>         4.4 Interpretation of same-document references
>>>
>>>         RFC 3986 defines certain relative URI references, in
>>>         particular the empty string and those of the form #fragment,
>>>         as same-document references. Dereferencing of same-document
>>>         references is handled specially. However, their use as the
>>>         value of an *xml:base attribute does not* *involve
>>>         *dereferencing, and XML Base processors should resolve them
>>>         in the usual way. In particular, xml:base="" does not reset
>>>         the base URI to that of the containing document.
>>>
>>>         Note:
>>>
>>>         Some existing processors do treat these xml:base values as
>>>         resetting the base URI to that of the containing document,
>>>         so the use of such values is strongly discouraged.
>>>
>>>     This says:
>>>
>>>         RFC 3986 defines special “dereferencing” of empty strings
>>>         and #fragments. But over here in XML-land, we don’t do
>>>         “dereferencing.“ That’s not a word we use here. Ignore that
>>>         stuff about same-document references. Just resolve empty
>>>         strings and #fragments as specified in the W3C
>>>         Recommendation above. And those of you who did carry that
>>>         stuff over from 3986 to XML-land, shame on you. You messed
>>>         things up for the rest of us.
>>>
>>>
>>>     A note has:
>>>
>>>         5. The meanings of xml:base="" and xml:base="#frag" have
>>>         been clarified;
>>>
>>>     This says:
>>>
>>>         In this second version of this recommendation, we added
>>>         paragraph 4.4 specifically to get you 3986 people to stop
>>>         polluting our W3C with your special cases. Stop doing that.
>>>
>>>     No?
>>>
>>>     -- John
>>>
>>>
>>>
>>>
>>>
>>>
>>>     On 5/3/2017 5:09 PM, Hugh Cayless wrote:
>>>>     That's exactly what it does. It's just that the behavior of
>>>>     same-document references is prescribed in such a way that they
>>>>     end up resolving to the current document regardless of the
>>>>     value of the @xml:base.
>>>>
>>>>     Put another way, @xml:base has an influence on what the client
>>>>     application will retrieve when it dereferences a URI and
>>>>     retrieves the referenced document, but in the case of
>>>>     same-document references no such retrieval is expected to
>>>>     occur—it's assumed the client already has the document.
>>>>
>>>>     On Wed, May 3, 2017 at 4:29 PM, John P. McCaskey
>>>>     <[hidden email] <[hidden email]>> wrote:
>>>>
>>>>         Oh. I thought it obvious they all resolved the same.
>>>>
>>>>         I think the same about this one. You think otherwise?
>>>>
>>>>         <div xml:base="http://www.myteiproject.com/"
>>>>         <http://www.myteiproject.com/> <div xml:base="images/">
>>>>         <div> <graphic url="logo.jpg"> </div> <div
>>>>         xml:base="http://www.dictionary.com/words/
>>>>         <http://dictionary.com/words/>"> <p xml:base="a.html"> <ref
>>>>         target="#apple">apple</ref> </p> </div> </div> </div>
>>>>
>>>>         I didn’t think xml:base had any necessary relation to the
>>>>         current document. I thought in XML (not HTML), it just sets
>>>>         a path for URIs below it.
>>>>
>>>>         --
>>>>
>>>>         On 5/3/2017 3:42 PM, Hugh Cayless wrote:
>>>>>         Well, we don't know about the third one, except that it
>>>>>         points to whatever element *in the same document* has the
>>>>>         @xml:id "apple". When an application attempts to
>>>>>         dereference it, it should make the URI in @ref absolute
>>>>>         and check it against the base
>>>>>         (http://www.dictionary.com/words/a.html
>>>>>         <http://www.dictionary.com/words/a.html>), discover they
>>>>>         are identical, decide it doesn't need to fetch anything,
>>>>>         and go looking in the current document for the element
>>>>>         with the id "apple".
>>>>>         On Wed, May 3, 2017 at 3:27 PM, John P. McCaskey
>>>>>         <[hidden email]
>>>>>         <[hidden email]>> wrote:
>>>>>
>>>>>             Are people proposing that these targets do not all
>>>>>             resolve to the same
>>>>>             http://www.dictionary.com/words/a.html#apple
>>>>>             <http://www.dictionary.com/words/a.html#apple>? <div
>>>>>             xml:base="http://www.dictionary.com/"
>>>>>             <http://www.dictionary.com/>> <p xml:base="words/">
>>>>>             <ref target="a.html#apple">apple</ref> </p> </div>
>>>>>             <div xml:base="http://www.dictionary.com/words/"
>>>>>             <http://www.dictionary.com/words/>> <p
>>>>>             xml:base="a.html"> <ref target="#apple">apple</ref>
>>>>>             </p> </div> <div
>>>>>             xml:base="http://www.dictionary.com/words/a.html"
>>>>>             <http://www.dictionary.com/words/a.html>> <p> <ref
>>>>>             target="#apple">apple</ref> </p> </div> <div
>>>>>             xml:base="http://www.dictionary.com/"
>>>>>             <http://www.dictionary.com/>> <p> <ref
>>>>>             target="words/a.html#apple">apple</ref> </p> </div>--
>>>>>

------------------------------

Date:    Thu, 4 May 2017 11:37:19 -0600
From:    "C. M. Sperberg-McQueen" <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

> On May 3, 2017, at 5:53 PM, John P. McCaskey <[hidden email]> wrote:
>
> So, bottom line: A standalone fragment identifier refers to the loaded document and all the xml:base values above it in the hierarchy are irrelevant. Encoders cannot use xml:base to direct a standalone #fragment value to a location outside the loaded document.
>
> Is that right?

Yes and no.  For some purposes, the ways in which the answer is 'no'
are pedantic and can be ignored; for others they seem important.
I did not respond to your summary yesterday, because objecting to
the wording you used seemed unnecessarily pedantic.  Also, I
overlooked the second sentence, which I think is the wrong conclusion
to draw.

Given the following fragment of resource http://example.org/eg.xml

  <div xml:base="http://www.dictionary.com/a.html">
    <p>
      <ref target="#apple">Apple</ref>
      <ref target="a.html#avocado">Avocado</ref>
      <ref target="http://www.dictionary.com/a.html#anise">Anise</ref>
    </p>
  </div>

we can consider two ways of interpreting the target attributes.

Note that the discussion below ignores some possibly salient facts:

- URIs can denote different resources at different moments.
    (The discussion assumes the URI-resource mapping is not changing.)
   
  - A given resource can have multiple representations.  (The
    discussion ignores any resulting complications.)
   
  - URIs whose path component ends in .xml and .html do not
     necessarily have particular MIME types, so there is no guarantee
     that a fragment identifier like #apple will have similar
     meanings.  (The discussion assumes that fragment identifiers
     point to elements assigned IDs by the HTML 'id' attribute
     and/or the xml:id attribute.)    

Interpretation 1 ('xml:base values are irrelevant').  Ignore xml:base
when resolving '#apple' [but not when resolving other relative
references].  The target attributes are interpreted as denoting

    (a) http://example.org/eg.xml#apple
    (b) http://www.dictionary.com/a.html#avocado
    (c) http://www.dictionary.com/a.html#anise

None of these have anything to do with any of:

    (d) http://www.dictionary.com/a.html#apple
    (e) http://example.org/eg.xml#avocado
    (f) http://example.org/eg.xml#anise   

Of these, (a) is a same-document reference and RFC 3986 says it
"should" be dereferenced without a new retrieval action.  If a new
retrieval action is nevertheless launched, the resource retrieved is
(a).

[It is not clear to me whether the XVAI ('xml:base values are
irrelevant') interpretation takes a position on whether any of these
other than (a) are same-document references.]

Interpretation 2 ('same-document references can be surprising', or
SRCBS).  Resolve all relative references against the base URI in the
usual way.  Dereference same-document references either by looking in
the same document (as recommended by RFC 3986) or by launching a new
retrieval operation.

The target attributes are resolved to the absolute forms

    (d) http://www.dictionary.com/a.html#apple
    (b) http://www.dictionary.com/a.html#avocado
    (c) http://www.dictionary.com/a.html#anise

All of these are same-document references, so according to RFC 3986,
they should be dereferenced without a new retrieval action.  If a
retrieval action is nevertheless launched, it will go to URIs (d),
(b), (c) respectively, not (a), (e), (f).  From the fact that (d),
(b), and (c) can be dereferenced without a new retrieval, it follows
(as far as I can tell) that these three resources can also be denoted
by URIs (a), (e), (f).

The XVAI and SRCBS interpretations agree on the following proposition,
which has important relevance for operations on the data:

P1 The relative reference target="#apple" can be dereferenced by
locating the element in the current document with xml:id="apple", if
such an element exists.

For people whose main interest is the truth or falsity of that
proposition, then, the answer is "yes, that's right" -- the effect is
the same, and all else is just pilpul.

From P1, it follows (I think) that

P2 The resource identified by target="#apple" is identified by the
absolute URI http://example.org/eg.xml#apple ((a) above).

The two interpretations disagree, or seem to disagree, on a number of
other propositions, most obviously:

P3 The relative reference '#apple' does not identify the resource
identifed by http://www.dictionary.com/a.html#apple.

XVAI does not actually entail P2, but it is compatible with P3.  (To
reach P3 it is necessary to assume some rule like "No two URIs
identify the same thing" or "If we don't know that a URI identifies a
thing, then it does not identify that thing.")  SRCBS entails the
negation of P3.

They also prescribe different URIs for the case that software
determines to perform a fresh retrieval action for the relative
reference #apple: XVAI prescribes the absolute URI (a), SRCBS
prescribes URI (d).

I don’t believe anyone has seriously suggested XVAI as the
relevant rule of interpretation for examples like the one given;
what I have suggested (and I have understood Hugh Cayless
to be agreeing with) is SRCBS.  Operationally, they can have
similar results in some circumstances (specifically:  they can
both result in no new retrieval action being undertaken in order
to dereference ‘#apple’), but they differ in ways which can be
critical.


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************

------------------------------

Date:    Thu, 4 May 2017 14:28:37 -0400
From:    Hugh Cayless <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

That's a rather favorable interpretation on your part. One person agrees
with you without elucidating, one says this discussion has jumped the shark
(which is fair), and the third (Michael Kay) gives a fuller answer which
adds up to "it depends". Michael Kay is quite correct that in the context
where a document retrieval is *expected* to occur, the URI would indeed be
computed with reference to its base and fetched.

The thing is, I'm not aware of any TEI attributes or element/attribute
combinations which are defined as *forcing* a retrieval action. I'd be
happy to be corrected if I'm missing any, of course.

It's fair to ask not just how one might expect them to behave, but what
same-document references *mean* in the context of TEI documents with
@xml:base. I agree this is something we ought to make clear. I think there
is some possibility of wiggle room, given that TEI has its own media type.
But I also think that we'd be better off adhering to the letter of RFC
3986. The use of same-document references in TEI documents is ubiquitous,
and I'm firmly against anything that might break them.

For what it's worth, modern web browsers seem to agree with your
interpretation (mutatis mutandis—HTML base is not @xml:base). As far as I
can tell, probably because of a desire on the part of the Mozilla
developers back in the day to maintain compatibility with IE 4(!).[1]

To further complicate matters, the author of RFC 3986, Roy Fielding, has
said that using @xml:base in the way you propose, i.e. to enable shorthand
references rather than to set a canonical URI for the current document, is
abusive.[2]

Given all this, I still agree with Michael Sperberg-McQueen's fuller
explication of the issues at hand. I would interpret same-document
references as pointing to the current document, with the caveat that there
might be, now or in the future, certain pointer attributes or
element/attribute combinations that mandate a retrieval action. Such a
retrieval would necessarily use whatever base was defined for the URI in
question.

References:
1.
http://w3future.com/weblog/2005/01/13.xml#stillBugsInTheImplementationOfHtmlHyperlinks
2. http://w3future.com/weblog/2005/08/14.xml#howToUseBaseUris

On Thu, May 4, 2017 at 10:16 AM, John P. McCaskey <[hidden email]>
wrote:

> I asked about this over at XML-DEV, http://lists.xml.org/archives/
> xml-dev/201705/msg00008.html.
>
> Opinion there is with me and opposite the majority here.
>
> Readers there don’t seem to think there is anything to debate. I got one
> short answer and one dismissive comment about ”questions that simple.”
> Someone take a look and be sure I didn’t word my question unfairly.
> Whichever interpretation TEI adopts, sounds like it should be documented
> in the Guidelines or a note somewhere.
>
> If the McCaskey/XML-DEV interpretation is adopted, the GitHub issue I
> posted about @rendition stands. If not, that issue goes away.
>
> John
>
>
>
> On 5/3/2017 8:03 PM, Hugh Cayless wrote:
>
> Those are the cards we've been dealt, yeah.
>
> Sent from my phone.
>
> On May 3, 2017, at 19:53, John P. McCaskey <[hidden email]>
> wrote:
>
> So, bottom line: A standalone fragment identifier refers to the loaded
> document and all the xml:base values above it in the hierarchy are
> irrelevant. Encoders cannot use xml:base to direct a standalone #fragment
> value to a location outside the loaded document.
>
> Is that right?
>
> --
>
>
> On 5/3/2017 6:20 PM, Hugh Cayless wrote:
>
> No. This clarifies the expected behavior when (e.g.) you have @xml:base="#frag".
> It says nothing whatever about <p rendition="#foo">. I don't think these
> specifications interact in the way you're positing. Quite the opposite.
>
> On Wed, May 3, 2017 at 6:13 PM, John P. McCaskey <[hidden email]
> > wrote:
>
>> The special behavior of same-document references in RFC 3986 is disavowed
>> by W3C:
>>
>> 4.4 Interpretation of same-document references
>>
>> RFC 3986 defines certain relative URI references, in particular the empty
>> string and those of the form #fragment, as same-document references.
>> Dereferencing of same-document references is handled specially. However,
>> their use as the value of an *xml:base attribute does not* *involve *dereferencing,
>> and XML Base processors should resolve them in the usual way. In
>> particular, xml:base="" does not reset the base URI to that of the
>> containing document.
>>
>> Note:
>>
>> Some existing processors do treat these xml:base values as resetting the
>> base URI to that of the containing document, so the use of such values is
>> strongly discouraged.
>>
>> This says:
>>
>> RFC 3986 defines special “dereferencing” of empty strings and #fragments.
>> But over here in XML-land, we don’t do “dereferencing.“ That’s not a word
>> we use here. Ignore that stuff about same-document references. Just resolve
>> empty strings and #fragments as specified in the W3C Recommendation above.
>> And those of you who did carry that stuff over from 3986 to XML-land, shame
>> on you. You messed things up for the rest of us.
>>
>>
>> A note has:
>>
>> 5. The meanings of xml:base="" and xml:base="#frag" have been clarified;
>>
>> This says:
>>
>> In this second version of this recommendation, we added paragraph 4.4
>> specifically to get you 3986 people to stop polluting our W3C with your
>> special cases. Stop doing that.
>>
>> No?
>>
>> -- John
>>
>>
>>
>>
>>
>>
>> On 5/3/2017 5:09 PM, Hugh Cayless wrote:
>>
>> That's exactly what it does. It's just that the behavior of same-document
>> references is prescribed in such a way that they end up resolving to the
>> current document regardless of the value of the @xml:base.
>>
>> Put another way, @xml:base has an influence on what the client
>> application will retrieve when it dereferences a URI and retrieves the
>> referenced document, but in the case of same-document references no such
>> retrieval is expected to occur—it's assumed the client already has the
>> document.
>>
>> On Wed, May 3, 2017 at 4:29 PM, John P. McCaskey <
>> [hidden email]> wrote:
>>
>>> Oh. I thought it obvious they all resolved the same.
>>>
>>> I think the same about this one. You think otherwise?
>>>
>>> <div xml:base="http://www.myteiproject.com/" <http://www.myteiproject.com/>
>>>   <div xml:base="images/">
>>>     <div>
>>>       <graphic url="logo.jpg">
>>>     </div>
>>>     <div xml:base="http://www.dictionary.com/words/">
>>>       <p xml:base="a.html">
>>>         <ref target="#apple">apple</ref>
>>>       </p>
>>>     </div>
>>>   </div>
>>> </div>
>>>
>>> I didn’t think xml:base had any necessary relation to the current
>>> document. I thought in XML (not HTML), it just sets a path for URIs below
>>> it.
>>>
>>> --
>>> On 5/3/2017 3:42 PM, Hugh Cayless wrote:
>>>
>>> Well, we don't know about the third one, except that it points to
>>> whatever element *in the same document* has the @xml:id "apple". When
>>> an application attempts to dereference it, it should make the URI in @ref
>>> absolute and check it against the base (http://www.dictionary.com/wor
>>> ds/a.html), discover they are identical, decide it doesn't need to
>>> fetch anything, and go looking in the current document for the element with
>>> the id "apple".
>>> On Wed, May 3, 2017 at 3:27 PM, John P. McCaskey <
>>> [hidden email]> wrote:
>>>>
>>>> Are people proposing that these targets do not all resolve
>>>> to the same http://www.dictionary.com/words/a.html#apple?
>>>> <div xml:base="http://www.dictionary.com/" <http://www.dictionary.com/>>
>>>>     <p xml:base="words/">
>>>>         <ref target="a.html#apple">apple</ref>
>>>>     </p>
>>>> </div>
>>>>
>>>> <div xml:base="http://www.dictionary.com/words/" <http://www.dictionary.com/words/>>
>>>>     <p xml:base="a.html">
>>>>         <ref target="#apple">apple</ref>
>>>>     </p>
>>>> </div>
>>>>
>>>> <div xml:base="http://www.dictionary.com/words/a.html" <http://www.dictionary.com/words/a.html>>
>>>>     <p>
>>>>         <ref target="#apple">apple</ref>
>>>>     </p>
>>>> </div>
>>>>
>>>> <div xml:base="http://www.dictionary.com/" <http://www.dictionary.com/>>
>>>>     <p>
>>>>         <ref target="words/a.html#apple">apple</ref>
>>>>     </p>
>>>> </div>
>>>> --
>>>>
>>>>

------------------------------

Date:    Thu, 4 May 2017 15:31:42 -0400
From:    "John P. McCaskey" <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

What is the short bottom-line guidance for someone trying to encode a
document? Is it this?

    To point inside a TEI document, as for @rendition, use pointers of
    the form #destination. Any later changes to xml:base values will not
    mess up your internal # pointers.

    Don’t try to point outside your TEI document by using xml:base plus
    a #destination in the pointer attribute. No xml:base attribute will
    be prepended to a pointer that begins with a #.

Even if I and others find that idiosyncratic and surprising, it’s
unambiguous, has practical benefits, requires no knowledge of RFCs or
W3C specs, is easy to articulate, and is easy to encode to.

John



On 5/4/2017 2:28 PM, Hugh Cayless wrote:
> That's a rather favorable interpretation on your part. One person
> agrees with you without elucidating, one says this discussion has
> jumped the shark (which is fair), and the third (Michael Kay) gives a
> fuller answer which adds up to "it depends". Michael Kay is quite
> correct that in the context where a document retrieval is *expected*
> to occur, the URI would indeed be computed with reference to its base
> and fetched.
>
> The thing is, I'm not aware of any TEI attributes or element/attribute
> combinations which are defined as *forcing* a retrieval action. I'd be
> happy to be corrected if I'm missing any, of course.
>
> It's fair to ask not just how one might expect them to behave, but
> what same-document references *mean* in the context of TEI documents
> with @xml:base. I agree this is something we ought to make clear. I
> think there is some possibility of wiggle room, given that TEI has its
> own media type. But I also think that we'd be better off adhering to
> the letter of RFC 3986. The use of same-document references in TEI
> documents is ubiquitous, and I'm firmly against anything that might
> break them.
>
> For what it's worth, modern web browsers seem to agree with your
> interpretation (mutatis mutandis—HTML base is not @xml:base). As far
> as I can tell, probably because of a desire on the part of the Mozilla
> developers back in the day to maintain compatibility with IE 4(!).[1]
>
> To further complicate matters, the author of RFC 3986, Roy Fielding,
> has said that using @xml:base in the way you propose, i.e. to enable
> shorthand references rather than to set a canonical URI for the
> current document, is abusive.[2]
>
> Given all this, I still agree with Michael Sperberg-McQueen's fuller
> explication of the issues at hand. I would interpret same-document
> references as pointing to the current document, with the caveat that
> there might be, now or in the future, certain pointer attributes or
> element/attribute combinations that mandate a retrieval action. Such a
> retrieval would necessarily use whatever base was defined for the URI
> in question.
>
> References:
> 1.
> http://w3future.com/weblog/2005/01/13.xml#stillBugsInTheImplementationOfHtmlHyperlinks
> 2. http://w3future.com/weblog/2005/08/14.xml#howToUseBaseUris
>
> On Thu, May 4, 2017 at 10:16 AM, John P. McCaskey
> <[hidden email] <[hidden email]>> wrote:
>
>     I asked about this over at XML-DEV,
>     http://lists.xml.org/archives/xml-dev/201705/msg00008.html
>     <http://lists.xml.org/archives/xml-dev/201705/msg00008.html>.
>
>     Opinion there is with me and opposite the majority here.
>
>     Readers there don’t seem to think there is anything to debate. I
>     got one short answer and one dismissive comment about ”questions
>     that simple.” Someone take a look and be sure I didn’t word my
>     question unfairly.
>
>     Whichever interpretation TEI adopts, sounds like it should be
>     documented in the Guidelines or a note somewhere.
>
>     If the McCaskey/XML-DEV interpretation is adopted, the GitHub
>     issue I posted about @rendition stands. If not, that issue goes away.
>
>     John
>
>
>
>     On 5/3/2017 8:03 PM, Hugh Cayless wrote:
>>     Those are the cards we've been dealt, yeah.
>>
>>     Sent from my phone.
>>
>>     On May 3, 2017, at 19:53, John P. McCaskey
>>     <[hidden email] <[hidden email]>> wrote:
>>
>>>     So, bottom line: A standalone fragment identifier refers to the
>>>     loaded document and all the xml:base values above it in the
>>>     hierarchy are irrelevant. Encoders cannot use xml:base to direct
>>>     a standalone #fragment value to a location outside the loaded
>>>     document.
>>>
>>>     Is that right?
>>>
>>>     --
>>>
>>>
>>>     On 5/3/2017 6:20 PM, Hugh Cayless wrote:
>>>>     No. This clarifies the expected behavior when (e.g.) you have
>>>>     @xml:base="#frag". It says nothing whatever about <p
>>>>     rendition="#foo">. I don't think these specifications interact
>>>>     in the way you're positing. Quite the opposite.
>>>>
>>>>     On Wed, May 3, 2017 at 6:13 PM, John P. McCaskey
>>>>     <[hidden email] <[hidden email]>> wrote:
>>>>
>>>>         The special behavior of same-document references in RFC
>>>>         3986 is disavowed by W3C:
>>>>
>>>>             4.4 Interpretation of same-document references
>>>>
>>>>             RFC 3986 defines certain relative URI references, in
>>>>             particular the empty string and those of the form
>>>>             #fragment, as same-document references. Dereferencing
>>>>             of same-document references is handled specially.
>>>>             However, their use as the value of an *xml:base
>>>>             attribute does not* *involve *dereferencing, and XML
>>>>             Base processors should resolve them in the usual way.
>>>>             In particular, xml:base="" does not reset the base URI
>>>>             to that of the containing document.
>>>>
>>>>             Note:
>>>>
>>>>             Some existing processors do treat these xml:base values
>>>>             as resetting the base URI to that of the containing
>>>>             document, so the use of such values is strongly
>>>>             discouraged.
>>>>
>>>>         This says:
>>>>
>>>>             RFC 3986 defines special “dereferencing” of empty
>>>>             strings and #fragments. But over here in XML-land, we
>>>>             don’t do “dereferencing.“ That’s not a word we use
>>>>             here. Ignore that stuff about same-document references.
>>>>             Just resolve empty strings and #fragments as specified
>>>>             in the W3C Recommendation above. And those of you who
>>>>             did carry that stuff over from 3986 to XML-land, shame
>>>>             on you. You messed things up for the rest of us.
>>>>
>>>>
>>>>         A note has:
>>>>
>>>>             5. The meanings of xml:base="" and xml:base="#frag"
>>>>             have been clarified;
>>>>
>>>>         This says:
>>>>
>>>>             In this second version of this recommendation, we added
>>>>             paragraph 4.4 specifically to get you 3986 people to
>>>>             stop polluting our W3C with your special cases. Stop
>>>>             doing that.
>>>>
>>>>         No?
>>>>
>>>>         -- John
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>         On 5/3/2017 5:09 PM, Hugh Cayless wrote:
>>>>>         That's exactly what it does. It's just that the behavior
>>>>>         of same-document references is prescribed in such a way
>>>>>         that they end up resolving to the current document
>>>>>         regardless of the value of the @xml:base.
>>>>>
>>>>>         Put another way, @xml:base has an influence on what the
>>>>>         client application will retrieve when it dereferences a
>>>>>         URI and retrieves the referenced document, but in the case
>>>>>         of same-document references no such retrieval is expected
>>>>>         to occur—it's assumed the client already has the document.
>>>>>
>>>>>         On Wed, May 3, 2017 at 4:29 PM, John P. McCaskey
>>>>>         <[hidden email]
>>>>>         <[hidden email]>> wrote:
>>>>>
>>>>>             Oh. I thought it obvious they all resolved the same.
>>>>>
>>>>>             I think the same about this one. You think otherwise?
>>>>>
>>>>>             <div xml:base="http://www.myteiproject.com/"
>>>>>             <http://www.myteiproject.com/> <div
>>>>>             xml:base="images/"> <div> <graphic url="logo.jpg">
>>>>>             </div> <div xml:base="http://www.dictionary.com/words/
>>>>>             <http://dictionary.com/words/>"> <p xml:base="a.html">
>>>>>             <ref target="#apple">apple</ref> </p> </div> </div>
>>>>>             </div>
>>>>>
>>>>>             I didn’t think xml:base had any necessary relation to
>>>>>             the current document. I thought in XML (not HTML), it
>>>>>             just sets a path for URIs below it.
>>>>>
>>>>>             --
>>>>>
>>>>>             On 5/3/2017 3:42 PM, Hugh Cayless wrote:
>>>>>>             Well, we don't know about the third one, except that
>>>>>>             it points to whatever element *in the same document*
>>>>>>             has the @xml:id "apple". When an application attempts
>>>>>>             to dereference it, it should make the URI in @ref
>>>>>>             absolute and check it against the base
>>>>>>             (http://www.dictionary.com/words/a.html
>>>>>>             <http://www.dictionary.com/words/a.html>), discover
>>>>>>             they are identical, decide it doesn't need to fetch
>>>>>>             anything, and go looking in the current document for
>>>>>>             the element with the id "apple".
>>>>>>             On Wed, May 3, 2017 at 3:27 PM, John P. McCaskey
>>>>>>             <[hidden email]
>>>>>>             <[hidden email]>> wrote:
>>>>>>
>>>>>>                 Are people proposing that these targets do not
>>>>>>                 all resolve to the same
>>>>>>                 http://www.dictionary.com/words/a.html#apple
>>>>>>                 <http://www.dictionary.com/words/a.html#apple>?
>>>>>>                 <div xml:base="http://www.dictionary.com/"
>>>>>>                 <http://www.dictionary.com/>> <p
>>>>>>                 xml:base="words/"> <ref
>>>>>>                 target="a.html#apple">apple</ref> </p> </div>
>>>>>>                 <div xml:base="http://www.dictionary.com/words/"
>>>>>>                 <http://www.dictionary.com/words/>> <p
>>>>>>                 xml:base="a.html"> <ref
>>>>>>                 target="#apple">apple</ref> </p> </div> <div
>>>>>>                 xml:base="http://www.dictionary.com/words/a.html"
>>>>>>                 <http://www.dictionary.com/words/a.html>> <p>
>>>>>>                 <ref target="#apple">apple</ref> </p> </div> <div
>>>>>>                 xml:base="http://www.dictionary.com/"
>>>>>>                 <http://www.dictionary.com/>> <p> <ref
>>>>>>                 target="words/a.html#apple">apple</ref> </p>
>>>>>>                 </div>--
>>>>>>

------------------------------

Date:    Thu, 4 May 2017 14:32:37 -0600
From:    "C. M. Sperberg-McQueen" <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

> On May 4, 2017, at 12:28 PM, Hugh Cayless <[hidden email]> wrote:
>
> That's a rather favorable interpretation on your part. One person agrees with you without elucidating,

I’m not sure this is true.  What Eliot Kimber said is that in the context given, ‘#apple’ identifies the same thing as ‘http://www.dictionary.com/a.html#apple'.  Does that distinguish between the two (? are there just two?) — does that distinguish among the various interpretations of the situation offered so far?

I don’t think so.  I think the disagreement we have is not over the statement  affirmed by Eliot Kimber, but over the following two claims:

  C1 In the context described, ‘#apple’ is a same-document reference and
  can therefore by definition be dereferenced without a new retrieval action.

  C2 In the context described, ‘#apple’ does not refer to the element in the
  current document with xml:id=“apple” (if any); it  cannot be dereferenced
  without a new retrieval action.

I intend C1 as a representation of the interpretation of 3986 I’ve been offering, and C2 as a representation of the interpretation offered by John McCaskey. (SRCBS and XVAI, in my note of earlier today.

> one says this discussion has jumped the shark (which is fair), and the third (Michael Kay) gives a fuller answer which adds up to "it depends". Michael Kay is quite correct that in the context where a document retrieval is *expected* to occur, the URI would indeed be computed with reference to its base and fetched.

The URI is *always* computed with reference to its base. 

Optimizations which produce the same result are, of course, allowed.  The preceding paragraph is a claim about the meaning of certain language constructs, not a claim about what the CPU and network controller do during evaluation of an expression by a conforming processor.

If it is then determined to be a same-document URI, the resource identified by that URI is then “defined to be within” the current document; in consequence no new retrieval is necessary and a new retrieval should be avoided.  The 'should' here means that 3986 recommends that new retrievals be avoided, but does not forbid new retrievals; if conforming processors or specs have good reason for launching new retrievals, that's not a violation of the rules of 3986.  The XSLT spec defines the document() function as always launching a new retrieval.  (Note that this does not amount to any claim by the XSLT spec that the relevant resource is not within the current document.)

>
> The thing is, I'm not aware of any TEI attributes or element/attribute combinations which are defined as *forcing* a retrieval action. I'd be happy to be corrected if I'm missing any, of course.
>
> It's fair to ask not just how one might expect them to behave, but what same-document references *mean* in the context of TEI documents with @xml:base. I agree this is something we ought to make clear. I think there is some possibility of wiggle room, given that TEI has its own media type. But I also think that we'd be better off adhering to the letter of RFC 3986. The use of same-document references in TEI documents is ubiquitous, and I'm firmly against anything that might break them.

Is it clear what counts as breakage, here?

If a given interpretation of the URI specs causes some URI references to break (by which I mean: to have an interpretation different from what the encoders intended), can we be confident that a contrary interpretation will not break any?  Or is it the case that one interpretation will break some URI references, and a different interpretation will break others?

When there are two possible interpreations of a given rule in a spec, it’s seldom the case that everyone interprets it the same way.  There is some risk that your choice does not lie between breaking things in TEI documents and not breaking them, but between breaking those belonging to one project and breaking those belonging to another project.

> For what it's worth, modern web browsers seem to agree with your interpretation (mutatis mutandis—HTML base is not @xml:base). As far as I can tell, probably because of a desire on the part of the Mozilla developers back in the day to maintain compatibility with IE 4(!).[1]

Can you expound? 

Do you mean that in an HTML 4.01 or XHTML 1.0 document at http://example.com/doc.html containing <base href=“http://example.org/base.html”>, a link of the form <a href=“#quovadis”>where does this connect to?</a> will go to … where? 

I think the two obvious things one might expect to see in the address bar after traversing that link are

    http://example.com/doc.html#quovadis
    http://example.org/base.html#quovaids

In the first case, this will happen either after a document refresh or without a refresh (just scrolling to the place).

If a browser shows one of these addresses, does that constitute a claim that the other identifies a different resource?  Or does it only constitute a decision on the part of the browser about which of the two possible paths to a given resource it should take?


> To further complicate matters, the author of RFC 3986, Roy Fielding, has said that using @xml:base in the way you propose, i.e. to enable shorthand references rather than to set a canonical URI for the current document, is abusive.[2]

Thank you for that reference.

I think the rules in 3986 make perfect sense if one assumes Roy Fielding’s principle that the base URI within a document should be the base URI of the document, and that using xml:base or html:base to shorten references which would otherwise be long is not a scenario worth bending over backwards for.  (That is, it seems to me to simplify matters, rather than complicating them.)

If one doesn’t want to make that assumption, some relatively simple rules like the following might go some distance towards reducing the likelihood of unpleasant surprises:

  - Use the form “#fragment” only for references to locations in the current document.  These will always be same-document references within the meaning of RFC 3986.
  - Use xml:base to set all but the last bit of the URI, but not to set a full document URI:  xml:base=“http://dictionary.example.org/entries/“ with relative references to “a.html#apple” and “a.html#anodyne” will be better than xml:base=“http://dictionary.example.org/entries/a.html” with relative referencs to “#apple” and “#anodyne”.  These will never be taken to be same-document references.

Note, however, that while “#fragment” will always be a same-document reference, it will ALSO always be a reference to the given fragment in the resource identified by the base URI.  If that’s not logically the same as the resource within which the reference occurs, you’re playing with fire:  RFC 3986 says “#fragment” “should” be dereferenced without a new retrieval, not that it must be.  Any software will be perfectly within its rights to retrieve the base URI and look for the fragment there.
 
>
> References:
> 1. http://w3future.com/weblog/2005/01/13.xml#stillBugsInTheImplementationOfHtmlHyperlinks
> 2. http://w3future.com/weblog/2005/08/14.xml#howToUseBaseUris


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************

------------------------------

Date:    Thu, 4 May 2017 14:42:11 -0600
From:    "C. M. Sperberg-McQueen" <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

> On May 4, 2017, at 1:31 PM, John P. McCaskey <[hidden email]> wrote:
>
> What is the short bottom-line guidance for someone trying to encode a document? Is it this?
>
> To point inside a TEI document, as for @rendition, use pointers of the form #destination. Any later changes to xml:base values will not mess up your internal # pointers.
>
> Don’t try to point outside your TEI document by using xml:base plus a #destination in the pointer attribute. No xml:base attribute will be prepended to a pointer that begins with a #.

No.   Sorry.  That’s not the way it works, and I think this discussion has already demonstrated that it’s a dangerous way to describe the behavior. 

If you want to keep things simple for the encoders, I’d rephrase this as something like:  Don’t try to point outside your TEI document by using xml:base plus a #destination in the pointer attribute.  It does not have the desired meaning.  Point outside your TEI document either using an absolute URI or an xml:base attribute plus the final part of the path (the ‘file name’ part) and the fragment identifier.  So NOT <ptr xml:base=“lib/foo.xml” target=“#bar”/> but <ptr xml:base=“lib/” target=“foo.xml#bar”/>.


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************

------------------------------

Date:    Thu, 4 May 2017 20:44:29 +0000
From:    Martin Mueller <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

I’m following this thread from a distance (much of it is way over my head), but it reminds me of something I’ve said before. Every few months or so there is an extended discussion on the TEI list that raises a tough issue.  It would be helpful to the members at large if the Council assumed the responsibility for digesting such a discussion into a one or two-page summary.  There might a special place on the site for such papers. Sometimes these discussion reach agreement, sometimes they just help clarify positions.

The obvious objection to my suggestion is that the members of the Council have other and more pressing things to do. On the other hand, documentation is clearly a responsibility of the Council, and position papers of this type are a kind of documentation. Or one could say “Why don’t the readers of the list take or make the time to read the extended discussion?” A good question, but there is a very high time cost involved in tracking a discussion that has half a dozen participants and dozens of entries.

I can’t be the only follower of this list who wouldn’t be grateful for a succinct account of “what was all this about?”

On 5/4/17, 3:32 PM, "TEI (Text Encoding Initiative) public discussion list on behalf of C. M. Sperberg-McQueen" <[hidden email] on behalf of [hidden email]> wrote:

    > On May 4, 2017, at 12:28 PM, Hugh Cayless <[hidden email]> wrote:
    >
    > That's a rather favorable interpretation on your part. One person agrees with you without elucidating,
   
    I’m not sure this is true.  What Eliot Kimber said is that in the context given, ‘#apple’ identifies the same thing as ‘https://urldefense.proofpoint.com/v2/url?u=http-3A__www.dictionary.com_a.html-23apple&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=OlhWZ8ch2cUtq8qgCtAvoznqSamkaD9OBrjp2Js-eJs&e= '.  Does that distinguish between the two (? are there just two?) — does that distinguish among the various interpretations of the situation offered so far?
   
    I don’t think so.  I think the disagreement we have is not over the statement  affirmed by Eliot Kimber, but over the following two claims:
   
      C1 In the context described, ‘#apple’ is a same-document reference and
      can therefore by definition be dereferenced without a new retrieval action.
   
      C2 In the context described, ‘#apple’ does not refer to the element in the
      current document with xml:id=“apple” (if any); it  cannot be dereferenced
      without a new retrieval action.
   
    I intend C1 as a representation of the interpretation of 3986 I’ve been offering, and C2 as a representation of the interpretation offered by John McCaskey. (SRCBS and XVAI, in my note of earlier today.
   
    > one says this discussion has jumped the shark (which is fair), and the third (Michael Kay) gives a fuller answer which adds up to "it depends". Michael Kay is quite correct that in the context where a document retrieval is *expected* to occur, the URI would indeed be computed with reference to its base and fetched.
   
    The URI is *always* computed with reference to its base. 
   
    Optimizations which produce the same result are, of course, allowed.  The preceding paragraph is a claim about the meaning of certain language constructs, not a claim about what the CPU and network controller do during evaluation of an expression by a conforming processor.
   
    If it is then determined to be a same-document URI, the resource identified by that URI is then “defined to be within” the current document; in consequence no new retrieval is necessary and a new retrieval should be avoided.  The 'should' here means that 3986 recommends that new retrievals be avoided, but does not forbid new retrievals; if conforming processors or specs have good reason for launching new retrievals, that's not a violation of the rules of 3986.  The XSLT spec defines the document() function as always launching a new retrieval.  (Note that this does not amount to any claim by the XSLT spec that the relevant resource is not within the current document.)
   
    >
    > The thing is, I'm not aware of any TEI attributes or element/attribute combinations which are defined as *forcing* a retrieval action. I'd be happy to be corrected if I'm missing any, of course.
    >
    > It's fair to ask not just how one might expect them to behave, but what same-document references *mean* in the context of TEI documents with @xml:base. I agree this is something we ought to make clear. I think there is some possibility of wiggle room, given that TEI has its own media type. But I also think that we'd be better off adhering to the letter of RFC 3986. The use of same-document references in TEI documents is ubiquitous, and I'm firmly against anything that might break them.
   
    Is it clear what counts as breakage, here?
   
    If a given interpretation of the URI specs causes some URI references to break (by which I mean: to have an interpretation different from what the encoders intended), can we be confident that a contrary interpretation will not break any?  Or is it the case that one interpretation will break some URI references, and a different interpretation will break others?
   
    When there are two possible interpreations of a given rule in a spec, it’s seldom the case that everyone interprets it the same way.  There is some risk that your choice does not lie between breaking things in TEI documents and not breaking them, but between breaking those belonging to one project and breaking those belonging to another project.
   
    > For what it's worth, modern web browsers seem to agree with your interpretation (mutatis mutandis—HTML base is not @xml:base). As far as I can tell, probably because of a desire on the part of the Mozilla developers back in the day to maintain compatibility with IE 4(!).[1]
   
    Can you expound? 
   
    Do you mean that in an HTML 4.01 or XHTML 1.0 document at https://urldefense.proofpoint.com/v2/url?u=http-3A__example.com_doc.html&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=29y8v-TK44tIIjTyqctxB4Lf9WPymQrVdzrLIvEW6qI&e=  containing <base href=“https://urldefense.proofpoint.com/v2/url?u=http-3A__example.org_base.html-25EF-25BF-25BD&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=lknBDjsTtVRI870bNMIn9dPdWUvd3whJSBhji92091Q&e= ��>, a link of the form <a href=“#quovadis”>where does this connect to?</a> will go to … where? 
   
    I think the two obvious things one might expect to see in the address bar after traversing that link are
   
        https://urldefense.proofpoint.com/v2/url?u=http-3A__example.com_doc.html-23quovadis&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=1hCsliHZFQuB2OctN_76mPFbm9rUqdBn1A2SVJccTFc&e=
        https://urldefense.proofpoint.com/v2/url?u=http-3A__example.org_base.html-23quovaids&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=COMDCJ41VanHpbGX9WMOSGuIRx38Szt4ldxlhdtB8vs&e=
   
    In the first case, this will happen either after a document refresh or without a refresh (just scrolling to the place).
   
    If a browser shows one of these addresses, does that constitute a claim that the other identifies a different resource?  Or does it only constitute a decision on the part of the browser about which of the two possible paths to a given resource it should take?
   
   
    > To further complicate matters, the author of RFC 3986, Roy Fielding, has said that using @xml:base in the way you propose, i.e. to enable shorthand references rather than to set a canonical URI for the current document, is abusive.[2]
   
    Thank you for that reference.
   
    I think the rules in 3986 make perfect sense if one assumes Roy Fielding’s principle that the base URI within a document should be the base URI of the document, and that using xml:base or html:base to shorten references which would otherwise be long is not a scenario worth bending over backwards for.  (That is, it seems to me to simplify matters, rather than complicating them.)
   
    If one doesn’t want to make that assumption, some relatively simple rules like the following might go some distance towards reducing the likelihood of unpleasant surprises:
   
      - Use the form “#fragment” only for references to locations in the current document.  These will always be same-document references within the meaning of RFC 3986.
      - Use xml:base to set all but the last bit of the URI, but not to set a full document URI:  xml:base=“https://urldefense.proofpoint.com/v2/url?u=http-3A__dictionary.example.org_entries_&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=uiFZhf2CmFnQIHzkLrsBKwvBnPTf2AtYINKYtwKyWMI&e= “ with relative references to “a.html#apple” and “a.html#anodyne” will be better than xml:base=“https://urldefense.proofpoint.com/v2/url?u=http-3A__dictionary.example.org_entries_a.html&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=pNUfEVwlyFPDAjVMWdE4ChwxmVlqytmyd6fi54Odw5A&e= ” with relative referencs to “#apple” and “#anodyne”.  These will never be taken to be same-document references.
   
    Note, however, that while “#fragment” will always be a same-document reference, it will ALSO always be a reference to the given fragment in the resource identified by the base URI.  If that’s not logically the same as the resource within which the reference occurs, you’re playing with fire:  RFC 3986 says “#fragment” “should” be dereferenced without a new retrieval, not that it must be.  Any software will be perfectly within its rights to retrieve the base URI and look for the fragment there.
    
    >
    > References:
    > 1. https://urldefense.proofpoint.com/v2/url?u=http-3A__w3future.com_weblog_2005_01_13.xml-23stillBugsInTheImplementationOfHtmlHyperlinks&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=EOYAXmorJnWNneuoby45ZricHwP8Ll9_ub7gKR2Ck-4&e=
    > 2. https://urldefense.proofpoint.com/v2/url?u=http-3A__w3future.com_weblog_2005_08_14.xml-23howToUseBaseUris&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=pUcTcRc_mAHAy8r8nPCz8cD8nuQBDOqDvJ-37BCwJJQ&e= 
   
   
    ********************************************
    C. M. Sperberg-McQueen
    Black Mesa Technologies LLC
    [hidden email]
    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.blackmesatech.com&d=DwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=bLi7T2xrOg4vVFRr7hE1SVQa7jmldPgLzCrevoRJWI4&s=ceMDgmbvrnWoyJVZ8VidJIp4udgcdq0tzNL8qVw_itM&e=
    ********************************************
   

------------------------------

Date:    Thu, 4 May 2017 17:07:00 -0400
From:    Hugh Cayless <[hidden email]>
Subject: Re: @xml:base with @rendition (and maybe other pointers)

On Thu, May 4, 2017 at 4:32 PM, C. M. Sperberg-McQueen <
[hidden email]> wrote:

>
>
> > one says this discussion has jumped the shark (which is fair), and the
> third (Michael Kay) gives a fuller answer which adds up to "it depends".
> Michael Kay is quite correct that in the context where a document retrieval
> is *expected* to occur, the URI would indeed be computed with reference to
> its base and fetched.
>
> The URI is *always* computed with reference to its base.
>

Right. It's the "and fetched" part that's crucial. In a context where
retrieval is forced, if the base is different from the document URI you'd
expect to end up somewhere else.

>
> Optimizations which produce the same result are, of course, allowed.  The
> preceding paragraph is a claim about the meaning of certain language
> constructs, not a claim about what the CPU and network controller do during
> evaluation of an expression by a conforming processor.
>
> If it is then determined to be a same-document URI, the resource
> identified by that URI is then “defined to be within” the current document;
> in consequence no new retrieval is necessary and a new retrieval should be
> avoided.  The 'should' here means that 3986 recommends that new retrievals
> be avoided, but does not forbid new retrievals; if conforming processors or
> specs have good reason for launching new retrievals, that's not a violation
> of the rules of 3986.  The XSLT spec defines the document() function as
> always launching a new retrieval.  (Note that this does not amount to any
> claim by the XSLT spec that the relevant resource is not within the current
> document.)
>
> >
> > The thing is, I'm not aware of any TEI attributes or element/attribute
> combinations which are defined as *forcing* a retrieval action. I'd be
> happy to be corrected if I'm missing any, of course.
> >
> > It's fair to ask not just how one might expect them to behave, but what
> same-document references *mean* in the context of TEI documents with
> @xml:base. I agree this is something we ought to make clear. I think there
> is some possibility of wiggle room, given that TEI has its own media type.
> But I also think that we'd be better off adhering to the letter of RFC
> 3986. The use of same-document references in TEI documents is ubiquitous,
> and I'm firmly against anything that might break them.
>
> Is it clear what counts as breakage, here?
>

Well, from my selfish perspective, "breakage" mainly means we've got to go
all over the Guidelines and add notes to the effect that '#fragment'
pointers may behave differently if @xml:base is set. To an extent, wanting
to avoid this is sheer laziness on my part. But I'm also convinced that
<tei:ref target="#foo"> in P5 is intended to mean the same thing that <ref
target="foo"> did in P4, when the value of @target was IDREFS rather than
teidata.pointer+ (I believe you mentioned this earlier), and that the
introduction of @xml:base was not intended to affect that meaning. I think
we're better off assuming that same-document references are referring to
the document that contains them.


>
> If a given interpretation of the URI specs causes some URI references to
> break (by which I mean: to have an interpretation different from what the
> encoders intended), can we be confident that a contrary interpretation will
> not break any?  Or is it the case that one interpretation will break some
> URI references, and a different interpretation will break others?
>
> When there are two possible interpreations of a given rule in a spec, it’s
> seldom the case that everyone interprets it the same way.  There is some
> risk that your choice does not lie between breaking things in TEI documents
> and not breaking them, but between breaking those belonging to one project
> and breaking those belonging to another project.
>
> > For what it's worth, modern web browsers seem to agree with your
> interpretation (mutatis mutandis—HTML base is not @xml:base). As far as I
> can tell, probably because of a desire on the part of the Mozilla
> developers back in the day to maintain compatibility with IE 4(!).[1]
>
> Can you expound?
>
> Do you mean that in an HTML 4.01 or XHTML 1.0 document at
> http://example.com/doc.html containing <base href=“
> http://example.org/base.html”>, a link of the form <a
> href=“#quovadis”>where does this connect to?</a> will go to … where?
>

In my brief experimentation, Chrome, Firefox, and Safari will all load
http://example.org/base.html#quovadis in this case, rather than scroll to
the element with id="quovadis" in the current page.

>
> I think the two obvious things one might expect to see in the address bar
> after traversing that link are
>
>     http://example.com/doc.html#quovadis
>     http://example.org/base.html#quovaids
>
> In the first case, this will happen either after a document refresh or
> without a refresh (just scrolling to the place).
>
> If a browser shows one of these addresses, does that constitute a claim
> that the other identifies a different resource?  Or does it only constitute
> a decision on the part of the browser about which of the two possible paths
> to a given resource it should take?
>
> I think it means the browser implementers decided to favor stability over
technical correctness.


>
> > To further complicate matters, the author of RFC 3986, Roy Fielding, has
> said that using @xml:base in the way you propose, i.e. to enable shorthand
> references rather than to set a canonical URI for the current document, is
> abusive.[2]
>
> Thank you for that reference.
>
> I think the rules in 3986 make perfect sense if one assumes Roy Fielding’s
> principle that the base URI within a document should be the base URI of the
> document, and that using xml:base or html:base to shorten references which
> would otherwise be long is not a scenario worth bending over backwards
> for.  (That is, it seems to me to simplify matters, rather than
> complicating them.)
>
> If one doesn’t want to make that assumption, some relatively simple rules
> like the following might go some distance towards reducing the likelihood
> of unpleasant surprises:
>
>   - Use the form “#fragment” only for references to locations in the
> current document.  These will always be same-document references within the
> meaning of RFC 3986.
>   - Use xml:base to set all but the last bit of the URI, but not to set a
> full document URI:  xml:base=“http://dictionary.example.org/entries/“
> with relative references to “a.html#apple” and “a.html#anodyne” will be
> better than xml:base=“http://dictionary.example.org/entries/a.html” with
> relative referencs to “#apple” and “#anodyne”.  These will never be taken
> to be same-document references.
>
> Note, however, that while “#fragment” will always be a same-document
> reference, it will ALSO always be a reference to the given fragment in the
> resource identified by the base URI.  If that’s not logically the same as
> the resource within which the reference occurs, you’re playing with fire:
> RFC 3986 says “#fragment” “should” be dereferenced without a new retrieval,
> not that it must be.  Any software will be perfectly within its rights to
> retrieve the base URI and look for the fragment there.
>

I think this is perfectly reasonable. I believe we're justified in saying
what the TEI expects "#fragment" to mean in the context of a TEI document,
but we can't guarantee that some piece of software that processes your
document won't make different decisions than we expect.

>
> >
> > References:
> > 1. http://w3future.com/weblog/2005/01/13.xml#
> stillBugsInTheImplementationOfHtmlHyperlinks
> > 2. http://w3future.com/weblog/2005/08/14.xml#howToUseBaseUris
>
>
> ********************************************
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> [hidden email]
> http://www.blackmesatech.com
> ********************************************
>
>

------------------------------

End of TEI-L Digest - 3 May 2017 to 4 May 2017 (#2017-101)
**********************************************************
Loading...