xml:base and the two interpretations of #apple

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

xml:base and the two interpretations of #apple

C. M. Sperberg-McQueen
Yet another mail about xml:base.  You have been warned.

Consider the following example, adapted from the one suggested in
earlier discussion by John McCaskey and understood to be part of the
document at http://example.org/doc.xml:

 <div xml:base="http://dict.example.com/a.xml">
     <ref target="#apple">Apple</ref>
     <ref target="a.xml#apple">Apple</ref>
     <ref target="http://dict.example.com/a.xml#apple">Apple</ref>
   <p xml:id="apple"> ... </p>

What is the meaning of the URI references in this fragment?  In the
earlier discussion many contributors seem to have been committed to
the assumption that the first reference must identify EITHER (a) the
element labeled "apple" in the doc.xml, i.e. the same thing as


OR (b) the element labeled "apple" in http://dict.example.com/a.xml,
i.e. the same thing as



Examination of the text of the relevant specs led some contributors
(well, one) to the conclusion that the correct answer is neither (a)
nor (b) but (c), for all references in the example; this conclusion
appears to have been unwelcome to some.  So unwelcome, in fact, that
both John McCaskey and Lou Burnard summarized the technical take-home
lesson by saying explicitly that the result was ONE of (a) or (b) BUT
NOT THE OTHER, thus shifting (as far as I can see) from one erroneous
interpretation to another.

In the discussion on xml-dev some people have asked how this problem
(I think they mean answer (c)) can be fixed, which has elicited an
example that may be worth reproducing here.  Consider a TEI document
which contains, inter alia, the lines

    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
        <xi:fallback><p>You need to run XInclude in order to see
        anything here …</p></xi:fallback>

The output, saved at http://example.org/doc.xml, might well contain a
div just like our initial example.  In this context, answer (c) seems
less an accidental result of some thoughtless changes to the
definition of URI resolution and more a coherent account of the
meaning of the references.

All of the contents of the div came, in this example, from
http://dict.example.com/a.xml, and in the context of that document all
will be interpreted as pointing to the element labeled "apple" in that
document, namely http://dict.example.com/a.xml#apple.  In the context
of the XInclude output document at .../doc.xml, all three references
identify an element which is, in fact, present in the current
document.  It is therefore possible to dereference them without
launching a new retrieval against host dict.example.com, and to reduce
unnecessary network traffic and unhelpful reinitialization of the
browser context (particularly irritating for voice browsers, according
to the discussions during development of RFC 3986), it is better that
no new retrieval be launched.  Which is exactly what RFC 3986 says.

In earlier discussions I was unable to offer any plausible concrete
scenarios in which the double interpretation of these references would
be desirable; I hope that this XInclude scenario will help some
readers understand the logic of RFC 3986 and also understand that it
is not in fact an error, but a reasonable interpretation for an
important case in the XML context.

C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]