Quantcast

how should <constraintSpec scheme="xsl"> be used in ODD?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

how should <constraintSpec scheme="xsl"> be used in ODD?

ron.vandenbranden
Administrator
Hi,

For the authoring package of the Journal of the Text Encoding Initative
I'm further investigating the incorporation of Schematron rules into ODD
<constraintSpec> elements. Our external ISO Schematron file makes use of
a couple of XSLT snippets to declare variables and keys that can be used
in Schematron tests (simplified):

   <schema xmlns="http://purl.oclc.org/dsdl/schematron"
           xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
           queryBinding="xslt2">
     <ns prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>
   
     <xsl:key name="ids" match="*" use="@xml:id"/>
     <xsl:variable name="straight.quotes">['"]</xsl:variable>
   
     <pattern>
       <rule context="tei:ref[@type eq 'bibl']">
         <assert test="key('ids', substring-after(@target, '#'))/self::tei:bibl">
           A bibliographic reference must point to an entry in the bibliography.
         </assert>
       </rule>
     </pattern>
   
     <pattern>
       <rule context="text()">
         <report test="matches(., $straight.quotes)">
           Don't use straight quotes.
         </report>
       </rule>
     </pattern>
   
   </schema>

I believe both patterns can't be expressed in regular Schematron
(without help of XSLT):
     -there's no xsl:key equivalent in Schematron for efficient
processing (instead, the Schematron spec explicitly allows the use of
<xsl:key>, see Annex C on p.21 of the PDF in
http://standards.iso.org/ittf/PubliclyAvailableStandards/c040833_ISO_IEC_19757-3_2006(E).zip)
     -since Schematron's <let name="" value=""/> element doesn't allow
text content, I don't see a way to define a regular expression that can
be used to match straight single and double quotes

As I'm finding my way in <constraintSpec> and friends, I've stumbled
upon <constraintSpec scheme="xsl">, which
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-constraintSpec.html 
defines as a legal constraint language within ODD. Yet, I couldn't find
any more information or examples. Could it be intended for constructs like:

   <constraintSpecident="straight.quotes"scheme="xsl">
     <constraint>  
       <xsl:variable name="straight.quotes">['"]</xsl:variable>
     </constraint>
   </constraintSpec>

?

When this ODD snippet is transformed into a RelaxNG schema with the TEI
stylesheets, only the literal string content of this <constraintSpec>
survive: ['"], which of course is unusable.

Am I misunderstanding completely how @scheme="xsl" works, or even why I
think I need help of XSLT in the first place? If anyone has suggestions
for expressing above Schematron in ODD, this would be of great help.

Best,

Ron

--
Ron Van den Branden

Technical Editor
jTEI  - Journal of the Text Encoding Initiative
         http://jtei.revues.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how should <constraintSpec scheme="xsl"> be used in ODD?

Sebastian Rahtz-3

> On 26 Jan 2015, at 23:05, [hidden email] <[hidden email]> wrote:
>
> …


> I believe both patterns can't be expressed in regular Schematron (without help of XSLT):
>    -there's no xsl:key equivalent in Schematron for efficient processing (instead, the Schematron spec explicitly allows the use of <xsl:key>, see Annex C on p.21 of the PDF in http://standards.iso.org/ittf/PubliclyAvailableStandards/c040833_ISO_IEC_19757-3_2006(E).zip)

two observations:

a) Have you actually tried it without key()?  I talked to Mike Kay about this once, and he suggested that
using key() doesn’t give that much benefit in Saxon, as he detects what you are doing and sets up an index
anyway.

b) the builtin id() function does just what you want anyway, unless I mistake?


for the regexp about quotes, can you use e.g. \uFFFF where FFFF is the hexadecimal number of the code point you want to match?
or something similar.

> As I'm finding my way in <constraintSpec> and friends, I've stumbled upon <constraintSpec scheme="xsl">, which http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-constraintSpec.html defines as a legal constraint language within ODD. Yet, I couldn't find any more information or examples.

indeed. I dont think it has ever been used or implemented. It just seemed like a nice idea, from memory.


--
Sebastian Rahtz      
Chief Data Architect
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how should <constraintSpec scheme="xsl"> be used in ODD?

Sebastian Rahtz-3
to follow up my own message, you can talk about straight quotes in an XPath regexp
using [#x22#x27]

there, isn’t that nice to know we don’t need the <let>?
--
Sebastian Rahtz      
Chief Data Architect
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how should <constraintSpec scheme="xsl"> be used in ODD?

ron.vandenbranden
Administrator
In reply to this post by Sebastian Rahtz-3
Hi Sebastian,

Thanks for your comments.

On 27/01/2015 0:17, Sebastian Rahtz wrote:
> two observations:
>
> a) Have you actually tried it without key()?  I talked to Mike Kay about this once, and he suggested that
> using key() doesn’t give that much benefit in Saxon, as he detects what you are doing and sets up an index
> anyway.
>
> b) the builtin id() function does just what you want anyway, unless I mistake?
>

Great suggestion, id() indeed had slipped my mind. I've tested and it
seems to do the job. The other keys in our current Schematron can be
expressed to in-line XPath expressions, so I think we can manage without
<xsl:key>. Progress! Considering your first remark and the fact that the
RelaxNG will be used to validate journal articles (of moderate length),
performance won't suffer too much, probably. I'll give it a try.

> for the regexp about quotes, can you use e.g. \uFFFF where FFFF is the hexadecimal number of the code point you want to match?
> or something similar.
[...]
> to follow up my own message, you can talk about straight quotes in an XPath regexp
> using [#x22#x27]
>
> there, isn’t that nice to know we don’t need the <let>?

Sorry, I'm feeling completely dense, but still can't get it working:

     -<report test="matches(., '[&#x22;&#x27;]')"> is invalid, since the
entities are resolved, and the parser interprets &#x27; as the end of
the regexp
     -<report test="matches(., '[x22x27]')"> will match just the literal
characters 'x', '2', and '7'
     -<report test="matches(., '[\ux22;\ux27;]')"> is invalid, since 'u'
is not a valid escape character

Am I missing something obvious?

>> As I'm finding my way in <constraintSpec> and friends, I've stumbled upon <constraintSpec scheme="xsl">, which http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-constraintSpec.html defines as a legal constraint language within ODD. Yet, I couldn't find any more information or examples.
> indeed. I dont think it has ever been used or implemented. It just seemed like a nice idea, from memory.

It still is ;-). Thanks for confirming.

Best,

Ron

--
Ron Van den Branden

Technical Editor
jTEI  - Journal of the Text Encoding Initiative
         http://jtei.revues.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how should <constraintSpec scheme="xsl"> be used in ODD?

Sebastian Rahtz-3

> On 27 Jan 2015, at 10:07, [hidden email] wrote:
>
>> to follow up my own message, you can talk about straight quotes in an XPath regexp
>> using [#x22#x27]
>>
>> there, isn’t that nice to know we don’t need the <let>?
>
> Sorry, I'm feeling completely dense, but still can't get it working:
>
>    -<report test="matches(., '[&#x22;&#x27;]')"> is invalid, since the entities are resolved, and the parser interprets &#x27; as the end of the regexp
>    -<report test="matches(., '[x22x27]')"> will match just the literal characters 'x', '2', and '7'
>    -<report test="matches(., '[\ux22;\ux27;]')"> is invalid, since 'u' is not a valid escape character
>
> Am I missing something obvious?

yes, you’re not copying my example literally, with the # before the x

I may have made a mistake last night when testing this, but I believe the notation is correct. I certainly saw things
coming out without quotes.

I wish I could point you at an unequivocal documentation for the notation, but I can’t.

--
Sebastian Rahtz      
Chief Data Architect
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how should <constraintSpec scheme="xsl"> be used in ODD?

ron.vandenbranden
Administrator
On 27/01/2015 11:14, Sebastian Rahtz wrote:

>> On 27 Jan 2015, at 10:07, [hidden email] wrote:
>> Sorry, I'm feeling completely dense, but still can't get it working:
>>
>>     -<report test="matches(., '[&#x22;&#x27;]')"> is invalid, since the entities are resolved, and the parser interprets &#x27; as the end of the regexp
>>     -<report test="matches(., '[x22x27]')"> will match just the literal characters 'x', '2', and '7'
>>     -<report test="matches(., '[\ux22;\ux27;]')"> is invalid, since 'u' is not a valid escape character
>>
>> Am I missing something obvious?
> yes, you’re not copying my example literally, with the # before the x
>

Blast, paste error, sorry. To confirm:
     <report test="matches(., '[#x22#x27]')">  only matches the literal
characters '#', 'x', '2' and '7'

> I may have made a mistake last night when testing this, but I believe the notation is correct. I certainly saw things
> coming out without quotes.
>
> I wish I could point you at an unequivocal documentation for the notation, but I can’t.
>

Ok, it's encouraging to know a direction, at least.

Best,

Ron
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how should <constraintSpec scheme="xsl"> be used in ODD?

Sebastian Rahtz-3

> On 27 Jan 2015, at 10:32, [hidden email] wrote:
>
>
> Blast, paste error, sorry. To confirm:
>    <report test="matches(., '[#x22#x27]')">  only matches the literal characters '#', 'x', '2' and ‘7

Sorry about that. i was definitely wrong, and my test was wrong :-{

--
Sebastian Rahtz      
Chief Data Architect
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how should <constraintSpec scheme="xsl"> be used in ODD?

Sebastian Rahtz-3
I have come to the conclusion that my remarks yesterday were simply rubbish, and that
regexp expressions in XPath (taken from XSD) do not support  away of specififying a Unicode point
except using entities (which, as we know, don’t work in this case).

back to <let>, I fear
--
Sebastian Rahtz      
Chief Data Architect
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how should <constraintSpec scheme="xsl"> be used in ODD?

ron.vandenbranden
Administrator
Hi Sebastian,

On 27/01/2015 13:44, Sebastian Rahtz wrote:
> I have come to the conclusion that my remarks yesterday were simply rubbish, and that
> regexp expressions in XPath (taken from XSD) do not support  away of specififying a Unicode point
> except using entities (which, as we know, don’t work in this case).
>
> back to <let>, I fear

...but that won't help in this case, will it? Unlike
<xsl:variable>['"]</xsl:variable>, <sch:let> only allows variable
assignment through a @value attribute, which has to be quoted, e.g.:

     <let name="string.a" value="'a'"/>

So the problem seems to remain if you want to include quotes in @value.

Best,

Ron
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how should <constraintSpec scheme="xsl"> be used in ODD?

ron.vandenbranden
Administrator
Hi Sebastian,

On 27/01/2015 14:05, [hidden email] wrote:

> On 27/01/2015 13:44, Sebastian Rahtz wrote:
>> back to <let>, I fear
>
> ...but that won't help in this case, will it? Unlike
> <xsl:variable>['"]</xsl:variable>, <sch:let> only allows variable
> assignment through a @value attribute, which has to be quoted, e.g.:
>
>     <let name="string.a" value="'a'"/>
>
> So the problem seems to remain if you want to include quotes in @value.

Ok, I think I have found a possible workaround with <let>.

To summarize the problem: I want to construct a regular expression for
matching /both/ single and double straight quotes with matches(). The
regular expression is simple enough: ['"], but problematic
     -when used literally as replacement string in matches(), which must
be quoted
     -inside a @test attribute, which must be quoted
Which causes problems because the quotes to be matched can't be properly
escaped: attribute values just can't contain both single and double quotes.

The canonical solution is indirection: declare the regular expression
(or the problematic characters) in a variable, and use that variable in
the replacement expression. The canonical XSLT suggestion is:

     <xsl:variable name="straight.quotes">['"]</xsl:variable>

...which can be used in e.g.: <xsl:when test="matches(.,
$straight.quotes)">. Because the quotes don't occur in an attribute
value, they can both be expressed literally, without need for escaping.

Unfortunately, the Schematron <let> element for declaring variables
doesn't allow text content, but requires that variables be declared in a
@value attribute. In this case, that's problematic, because of the
single and double quotes. Still, when used in isolation, they can be
properly escaped[1], allowing single and double quotation marks to be
defined as separate variables, that can be referred to elsewhere.

The most elegant solution I found was:

     <!-- define single and double quotes as separate variables -->
     <let name="apos" value='"&apos;"'/>
     <let name="quot" value="'&quot;'"/>
     <!-- combine single and double quotes in regular expression -->
     <let name="straight.quotes" value="concat('[', $apos, $quot, ']')"/>

...which can be used in a matches() function like:

     <report test="matches(., $straight.quotes)">

[1] Note: the Schematron <let> attribute doesn't seem to allow
XSLT2-style escaping of quotes (for a brief summary of the options, with
a reference see http://www.w3.org/TR/xpath20/#id-literals,
http://stackoverflow.com/questions/2887281/escape-single-quote-in-xslt-concat-function#2888544),
hence the use of the character entities. I've found inspiration for this
way of expressing quotes in an attribute value at
http://stackoverflow.com/questions/7613521/xsl-escaping-an-apostrophe-during-xslwhen-test/7617563#7614426.

This seems useful: I think I'm able now to express our Schematron
constraints in ODD without need for XSLT elements. With the fix for
<let> and friends in place in the ODD processing stylesheets
(https://github.com/TEIC/Stylesheets/commit/63a2bd32c80f02c739c527ed92c205bb7bd8f583),
I think I'm ready to go.

Best,

Ron
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how should <constraintSpec scheme="xsl"> be used in ODD?

ron.vandenbranden
Administrator
Ok,

I think I've found the holy grail after all. Consider this one-liner for
matching single and double quotes:

     <sch:report test="matches(., '[&quot;'']')"></sch:report>

This uses:
     -the character entity &quot; to escape the attribute value
delimiter (the double quote ")
     -the XPath 2.0 escape mechanism for quotes (the doubled single
quote '') to escape the string literal delimiter (the single quote ')
If only I had discovered http://markmail.org/message/7lyc5fvhkzfdzfhp 
earlier...

For completeness' sake and some reason, I was wrong when stating earlier
that:

On 28/01/2015 10:41, [hidden email] wrote:
>
> [1] Note: the Schematron <let> attribute doesn't seem to allow
> XSLT2-style escaping of quotes (for a brief summary of the options,
> with a reference see http://www.w3.org/TR/xpath20/#id-literals,
> http://stackoverflow.com/questions/2887281/escape-single-quote-in-xslt-concat-function#2888544),
> hence the use of the character entities.

The XPath 2.0 escape syntax for quotes /is/ supported:

   <sch:let name="apos" value="''''"/>
   <sch:let name="quote" value='""""'/>

So, to come back to my original example: it can be re-formulated as:

   <sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
     queryBinding="xslt2">
     <sch:ns prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>

     <sch:pattern>
       <sch:rule context="tei:ref[@type eq 'bibl']">
         <sch:assert test="id(substring-after(@target, '#'))/self::tei:bibl">
           A bibliographic reference must point to an entry in the bibliography.
         </sch:assert>
       </sch:rule>
     </sch:pattern>
     <sch:pattern>
       <sch:rule context="text()">
         <sch:report test="matches(., '[&quot;'']')">
           Don't use straight quotes.
         </sch:report>
       </sch:rule>
     </sch:pattern>
   </sch:schema>

...and integrated as such in an ODD file. No need for <xsl:key> or even
<sch:let>. It. Just. Works.

Though it will be nice to have <sch:let> available in TEI schemas. I'm
glad this is staged for an upcoming release of the TEI stylesheets
thanks to Sebastian's fix (see
https://github.com/TEIC/Stylesheets/commit/63a2bd32c80f02c739c527ed92c205bb7bd8f583).

Thanks for the help, though!

Best,

Ron

--
Ron Van den Branden

Technical Editor
jTEI  - Journal of the Text Encoding Initiative
         http://jtei.revues.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how should <constraintSpec scheme="xsl"> be used in ODD?

Sebastian Rahtz-3
Oh! I had no idea that XPath 2.0 had that escape syntax!! Congratulations.

>        <sch:assert test="id(substring-after(@target, '#'))/self::tei:bibl”>

can I suggest that id(substring(@target,2)) is slightly neater? (assuming you’ve
already checked that it starts with #


--
Sebastian Rahtz      
Chief Data Architect
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431



Loading...