Quantcast

can pure ODD define an alternation of attribute patterns?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

can pure ODD define an alternation of attribute patterns?

Piotr Bański-2
Hi all,

I would like to define an alternation of patterns of up to three attributes, where the datatype of attribute.2 and attribute.3 is uniform and depends on the value of attribute.1.

Something like the following (in *pseudo-markup*):

<rng:choice>
   <rng:group>  "pointer",       data.pointer,                      data.pointer </rng:group>
   <rng:group>  "character",   data.nonNegativeInteger, data.nonNegativeInteger </rng:group>
   <rng:group>  "byte_offset", data.nonNegativeInteger, data.nonNegativeInteger </rng:group>
   <rng:group>  "time_in_sec", data.integer,                    data.integer </rng:group>
  .... etc.
</rng:choice>

I could do that in RNG, but can I do that in ODD as well?

If I understand correctly, ODD expects me to list the three attDef declarations in an <attList> for my new class, and to declare a new data type that groups the data types that I need for attribute.2 and attribute.3, and to make the value of attribute.1 a closed list, and then to slam a huuuge piece of Schematron onto this (hopefully inside this very class definition; I haven't checked that) that would attempt to rule out unwanted sequences.

Is my diagnosis correct? I probably wouldn't mind being wrong.

I also admit to having a rather hazy idea of the extent of the difference in "staticness" between listing potential patterns in an RNG schema on the one hand, and defining them in the ODD on the other. I mean, I am not really sure if I could <alternate> a series of <sequence>s containing <attDef>s. Would I end up defining a single attribute several times? Would/Should ODD allow me to do that?

Thanks in advance,

  Piotr


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: can pure ODD define an alternation of attribute patterns?

Syd Bauman-10
I tihnk the short answer is "no", Pure ODD can't do that. But to be
honest, I haven't tried. This is in part because I know the `roma`
code that handles <attList org="choice"> is broken. (See bug 144.[1])

But ODD can do this, albeit in a roundabout, hack kind of way. (And,
it turns out in writing my test, I had to add a hack to the hack to
get around what I presume is a bug in ODD processing.)

The attached ODD is an example of a method that at best is likely to
be considered controversial, at worst bad practice. But it does
exactly what (I think) you want. It re-names the element <hi> to be
<methodOne>, and gives it three new required
attributes:
  attribute.1 = "pointer" | "character" | "byte_offset" | "time_in_sec"
  attribute.2 =  anyURI       nonNegInt     nonNegInt       int
  attribute.3 =  anyURI       nonNegInt     nonNegInt       int

A few caveats, in no particular order:

 * The method used will give some people angina.

 * Because RELAX NG cosntructs are used directly, you do not have the
   advantage that the value of attribute.3 is defined as
   teidata.pointer when attribute.1 is "pointer"; rather, it is
   defined directly as anyURI.

 * Hack: I had to use <rng:interleave> to group the attribute
   definitions; for some reason when I tried <rng:group> (which is
   what one would naturally expect to use) `roma` converted it to an
   <rng:choice>.

 * I really doubt you can get usable XSD, and you certainly can get
   helpful DTD, out of this.

So personally, I prefer the solution you recommended: just create a
closed list for @attribute.1, and then a small Schematron rule to
ensure that @attribute.2 and @attribute.3 are of the right datatype.

I have thrown that into the ODD as well, renaming <emph> to be
<methodTwo>, and giving it three new required attributes named
"anotherAtt.1" etc.

Notes
-----
[1] https://github.com/TEIC/Stylesheets/issues/144


<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="en"
    xmlns:sch="http://purl.oclc.org/dsdl/schematron"
    xmlns:rng="http://relaxng.org/ns/structure/1.0"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema-datatypes">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Test for Piotr Banski</title>
        <author xml:id="sb" ref="p:sbauman.emt">Syd Bauman</author>
      </titleStmt>
      <publicationStmt>
        <p>Unpublished, just to be posted to TEI-L</p>
        <availability>
          <p>© 2017 Syd Bauman and Northeastern University Women
          Writer's Project. Available via CC 4.0 BY-SA.</p>
        </availability>
      </publicationStmt>
      <sourceDesc>
        <p>No source, this ODD document is the original.</p>
      </sourceDesc>
    </fileDesc>
  </teiHeader>
  <text>
    <body>
      <schemaSpec ident="T4PB" prefix="T4PB_" start="TEI div p">
        <moduleRef key="tei"/>
        <moduleRef key="core"/>
        <moduleRef key="header"/>
        <moduleRef key="textstructure" except="div1 div2 div3 div4 div5 div6 div7"/>
        <macroSpec ident="att1ptr" mode="add">
          <content>
            <rng:interleave>
              <rng:attribute name="attribute.1">
                <rng:value type="token">pointer</rng:value>
              </rng:attribute>
              <rng:attribute name="attribute.2">
                <rng:data type="anyURI"/>
              </rng:attribute>
              <rng:attribute name="attribute.3">
                <rng:data type="anyURI"/>
              </rng:attribute>
            </rng:interleave>
          </content>
        </macroSpec>
        <macroSpec ident="att1char" mode="add">
          <content>
            <rng:interleave>
              <rng:attribute name="attribute.1">
                <rng:value type="token">character</rng:value>
              </rng:attribute>
              <rng:attribute name="attribute.2">
                <rng:data type="nonNegativeInteger"/>
              </rng:attribute>
              <rng:attribute name="attribute.3">
                <rng:data type="nonNegativeInteger"/>
              </rng:attribute>
            </rng:interleave>
          </content>
        </macroSpec>
        <macroSpec ident="att1offset" mode="add">
          <content>
            <rng:interleave>
              <rng:attribute name="attribute.1">
                <rng:value type="token">byte_offset</rng:value>
              </rng:attribute>
              <rng:attribute name="attribute.2">
                <rng:data type="nonNegativeInteger"/>
              </rng:attribute>
              <rng:attribute name="attribute.3">
                <rng:data type="nonNegativeInteger"/>
              </rng:attribute>
            </rng:interleave>
          </content>
        </macroSpec>
        <macroSpec ident="att1seconds" mode="add">
          <content>
            <rng:interleave>
              <rng:attribute name="attribute.1">
                <rng:value type="token">time_in_sec</rng:value>
              </rng:attribute>
              <rng:attribute name="attribute.2">
                <rng:data type="integer"/>
              </rng:attribute>
              <rng:attribute name="attribute.3">
                <rng:data type="integer"/>
              </rng:attribute>
            </rng:interleave>
          </content>
        </macroSpec>
        <elementSpec ident="hi" mode="change">
          <altIdent>methodOne</altIdent>
          <content>
            <alternate minOccurs="1" maxOccurs="1">
              <macroRef key="att1ptr"/>
              <macroRef key="att1char"/>
              <macroRef key="att1offset"/>
              <macroRef key="att1seconds"/>
            </alternate>
            <macroRef key="macro.paraContent"/>
          </content>
          <remarks>
            <p>The attributes on this element ...</p>
          </remarks>
        </elementSpec>
        <!-- above is ODD hack; below is PureODD with Schematron -->
        <classSpec ident="att.aPureODDmethod" type="atts" mode="add">
          <constraintSpec scheme="schematron" ident="two-and-three-match-one">
            <constraint>
              <sch:rule context="*[@anotherAtt.1 eq 'pointer']">
                <sch:assert test="@anotherAtt.2 castable as xsd:anyURI">when @anotherAtt.1 is 'poitner' @anotherAtt.2 must be a URI</sch:assert>
                <sch:assert test="@anotherAtt.3 castable as xsd:anyURI">when @anotherAtt.1 is 'poitner' @anotherAtt.3 must be a URI</sch:assert>
              </sch:rule>
              <sch:rule context="*[@anotherAtt.1 = ('character','byte_offset')]">
                <sch:assert test="@anotherAtt.2 castable as xsd:nonNegativeInteger">when @anotherAtt.1 is '<sch:value-of select="@anotherAtt.1"/>' @anotherAtt.2 must be a counting number</sch:assert>
                <sch:assert test="@anotherAtt.3 castable as xsd:nonNegativeInteger">when @anotherAtt.1 is '<sch:value-of select="@anotherAtt.1"/>' @anotherAtt.3 must be a counting number</sch:assert>
              </sch:rule>
              <sch:rule context="*[@anotherAtt.1 eq 'time_in_sec']">
                <sch:assert test="@anotherAtt.2 castable as xsd:integer">when @anotherAtt.1 is 'time_in_sec' @anotherAtt.2 must be an integer</sch:assert>
                <sch:assert test="@anotherAtt.3 castable as xsd:integer">when @anotherAtt.1 is 'time_in_sec' @anotherAtt.3 must be an integer</sch:assert>
              </sch:rule>
            </constraint>
          </constraintSpec>
          <attList>
            <attDef ident="anotherAtt.1">
              <datatype minOccurs="1" maxOccurs="1">
                <dataRef key="att1types"/>
              </datatype>
            </attDef>
            <attDef ident="anotherAtt.2">
              <datatype minOccurs="1" maxOccurs="1">
                <dataRef key="att23types"/>
              </datatype>
            </attDef>
            <attDef ident="anotherAtt.3">
              <datatype minOccurs="1" maxOccurs="1">
                <dataRef key="att23types"/>
              </datatype>
            </attDef>
          </attList>
        </classSpec>
        <dataSpec ident="att1types">
          <content>
            <valList type="closed">
              <valItem ident="pointer"/>
              <valItem ident="character"/>
              <valItem ident="byte_offset"/>
              <valItem ident="time_in_sec"/>
            </valList>
          </content>
        </dataSpec>
        <dataSpec ident="att23types">
          <content>
            <alternate>
              <dataRef key="teidata.pointer"/>
              <dataRef key="teidata.count"/>
              <dataRef name="integer"/>
            </alternate>
          </content>
        </dataSpec>
        <elementSpec ident="q" mode="change">
          <altIdent>methodTwo</altIdent>
          <classes mode="change">
            <memberOf key="att.aPureODDmethod"/>
          </classes>
        </elementSpec>
      </schemaSpec>
    </body>
  </text>
</TEI>


> I would like to define an alternation of patterns of up to three
> attributes, where the datatype of attribute.2 and attribute.3 is uniform
> and depends on the value of attribute.1.
>
> Something like the following (in **pseudo-markup**):
>
> <rng:choice>
>     <rng:group>  "pointer", data.pointer,                      
> data.pointer </rng:group>
>     <rng:group>  "character",   data.nonNegativeInteger,
> data.nonNegativeInteger </rng:group>
>     <rng:group>  "byte_offset", data.nonNegativeInteger,
> data.nonNegativeInteger </rng:group>
>     <rng:group>  "time_in_sec", data.integer,                    
> data.integer </rng:group>
>    .... etc.
> </rng:choice>
>
> I could do that in RNG, but can I do that in ODD as well?
>
> If I understand correctly, ODD expects me to list the three attDef
> declarations in an <attList> for my new class, and to declare a new data
> type that groups the data types that I need for attribute.2 and
> attribute.3, and to make the value of attribute.1 a closed list, and
> then to slam a huuuge piece of Schematron onto this (hopefully inside
> this very class definition; I haven't checked that) that would attempt
> to rule out unwanted sequences.
>
> Is my diagnosis correct? I probably wouldn't mind being wrong.
>
> I also admit to having a rather hazy idea of the extent of the
> difference in "staticness" between listing potential patterns in an RNG
> schema on the one hand, and defining them in the ODD on the other. I
> mean, I am not really sure if I could <alternate> a series of
> <sequence>s containing <attDef>s. Would I end up defining a single
> attribute several times? Would/Should ODD allow me to do that?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: can pure ODD define an alternation of attribute patterns?

Piotr Bański
Thank you, Angel Syd.

I find your angina-inducing alternation of attribute lists very nifty,
and also illuminating. On the other hand, as you imply, I'm afraid that
pushing this solution at the Council (as part of a larger proposal)
could be a recipe for defeat.

So I will examine the pure solution closely now, and will try to
implement it tonight. What I was afraid of, there, was the potential
complexity (and the ensuing error-proneness) of the Schematron. But the
way you do it, with `castable` and a direct reference to the data type,
it looks a lot less painful than I expected.

Wow, thank you so much again!

   Piotr

On 23/03/17 18:42, Syd Bauman wrote:

> I tihnk the short answer is "no", Pure ODD can't do that. But to be
> honest, I haven't tried. This is in part because I know the `roma`
> code that handles <attList org="choice"> is broken. (See bug 144.[1])
>
> But ODD can do this, albeit in a roundabout, hack kind of way. (And,
> it turns out in writing my test, I had to add a hack to the hack to
> get around what I presume is a bug in ODD processing.)
>
> The attached ODD is an example of a method that at best is likely to
> be considered controversial, at worst bad practice. But it does
> exactly what (I think) you want. It re-names the element <hi> to be
> <methodOne>, and gives it three new required
> attributes:
>   attribute.1 = "pointer" | "character" | "byte_offset" | "time_in_sec"
>   attribute.2 =  anyURI       nonNegInt     nonNegInt       int
>   attribute.3 =  anyURI       nonNegInt     nonNegInt       int
>
> A few caveats, in no particular order:
>
>  * The method used will give some people angina.
>
>  * Because RELAX NG cosntructs are used directly, you do not have the
>    advantage that the value of attribute.3 is defined as
>    teidata.pointer when attribute.1 is "pointer"; rather, it is
>    defined directly as anyURI.
>
>  * Hack: I had to use <rng:interleave> to group the attribute
>    definitions; for some reason when I tried <rng:group> (which is
>    what one would naturally expect to use) `roma` converted it to an
>    <rng:choice>.
>
>  * I really doubt you can get usable XSD, and you certainly can get
>    helpful DTD, out of this.
>
> So personally, I prefer the solution you recommended: just create a
> closed list for @attribute.1, and then a small Schematron rule to
> ensure that @attribute.2 and @attribute.3 are of the right datatype.
>
> I have thrown that into the ODD as well, renaming <emph> to be
> <methodTwo>, and giving it three new required attributes named
> "anotherAtt.1" etc.
>
> Notes
> -----
> [1] https://github.com/TEIC/Stylesheets/issues/144
>
>
>
>
>> I would like to define an alternation of patterns of up to three
>> attributes, where the datatype of attribute.2 and attribute.3 is uniform
>> and depends on the value of attribute.1.
>>
>> Something like the following (in **pseudo-markup**):
>>
>> <rng:choice>
>>     <rng:group>  "pointer", data.pointer,
>> data.pointer </rng:group>
>>     <rng:group>  "character",   data.nonNegativeInteger,
>> data.nonNegativeInteger </rng:group>
>>     <rng:group>  "byte_offset", data.nonNegativeInteger,
>> data.nonNegativeInteger </rng:group>
>>     <rng:group>  "time_in_sec", data.integer,
>> data.integer </rng:group>
>>    .... etc.
>> </rng:choice>
>>
>> I could do that in RNG, but can I do that in ODD as well?
>>
>> If I understand correctly, ODD expects me to list the three attDef
>> declarations in an <attList> for my new class, and to declare a new data
>> type that groups the data types that I need for attribute.2 and
>> attribute.3, and to make the value of attribute.1 a closed list, and
>> then to slam a huuuge piece of Schematron onto this (hopefully inside
>> this very class definition; I haven't checked that) that would attempt
>> to rule out unwanted sequences.
>>
>> Is my diagnosis correct? I probably wouldn't mind being wrong.
>>
>> I also admit to having a rather hazy idea of the extent of the
>> difference in "staticness" between listing potential patterns in an RNG
>> schema on the one hand, and defining them in the ODD on the other. I
>> mean, I am not really sure if I could <alternate> a series of
>> <sequence>s containing <attDef>s. Would I end up defining a single
>> attribute several times? Would/Should ODD allow me to do that?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: can pure ODD define an alternation of attribute patterns?

Piotr Bański
In reply to this post by Syd Bauman-10
Hi Syd and all,

One more little thing, about the xsd:anyURI check: I can't get it to
work, and the problem appears to only concern xsd:anyURI, whether I use
(1) to go via the TEI layer or (2) directly.

(1) <dataRef key="teidata.pointer"/>

(2) <dataRef name="anyURI"/>

If I do, e.g.

<sch:assert test="@my_attribute castable as xsd:anyURI">,

I can put anything into the attribute value, and it won't get flagged by
Schematron. I have done these checks with other data types, directly or
indirectly, and they all worked (so it's not that I use wrong syntax or
anything of that sort), but the behaviour of anyURI is different.

Is there some insider info anyone would care to share on this, please?

I looked for other examples of that within TEI/P5 and in Stylesheets,
but wasn't able to find any.

While composing this message, I found the following passage in the W3C
spec on XSD datatypes, and I hope I interpret it wrongly when I think
that this may be the issue:

"Because it is impractical for processors to check that a value is a
context-appropriate URI reference, this specification follows the lead
of [RFC 2396] (as amended by [RFC 2732]) in this matter: such rules and
restrictions are not part of type validity and are not checked by
·minimally conforming· processors. Thus in practice the above definition
imposes only very modest obligations on ·minimally conforming· processors."

https://www.w3.org/TR/xmlschema-2/#anyURI

Would be correct to assume that we're looking at a "minimally
conforming" behaviour here? My checks were tested in oXygen, but also
during the TEI/P5 build process, which also validates with Schematron.

And a practical question: should I rather validate against a regex than
use `castable as xsd:anyURI`?

Thanks in advance!

   Piotr

On 23/03/17 18:42, Syd Bauman wrote:

> I tihnk the short answer is "no", Pure ODD can't do that. But to be
> honest, I haven't tried. This is in part because I know the `roma`
> code that handles <attList org="choice"> is broken. (See bug 144.[1])
>
> But ODD can do this, albeit in a roundabout, hack kind of way. (And,
> it turns out in writing my test, I had to add a hack to the hack to
> get around what I presume is a bug in ODD processing.)
>
> The attached ODD is an example of a method that at best is likely to
> be considered controversial, at worst bad practice. But it does
> exactly what (I think) you want. It re-names the element <hi> to be
> <methodOne>, and gives it three new required
> attributes:
>   attribute.1 = "pointer" | "character" | "byte_offset" | "time_in_sec"
>   attribute.2 =  anyURI       nonNegInt     nonNegInt       int
>   attribute.3 =  anyURI       nonNegInt     nonNegInt       int
>
> A few caveats, in no particular order:
>
>  * The method used will give some people angina.
>
>  * Because RELAX NG cosntructs are used directly, you do not have the
>    advantage that the value of attribute.3 is defined as
>    teidata.pointer when attribute.1 is "pointer"; rather, it is
>    defined directly as anyURI.
>
>  * Hack: I had to use <rng:interleave> to group the attribute
>    definitions; for some reason when I tried <rng:group> (which is
>    what one would naturally expect to use) `roma` converted it to an
>    <rng:choice>.
>
>  * I really doubt you can get usable XSD, and you certainly can get
>    helpful DTD, out of this.
>
> So personally, I prefer the solution you recommended: just create a
> closed list for @attribute.1, and then a small Schematron rule to
> ensure that @attribute.2 and @attribute.3 are of the right datatype.
>
> I have thrown that into the ODD as well, renaming <emph> to be
> <methodTwo>, and giving it three new required attributes named
> "anotherAtt.1" etc.
>
> Notes
> -----
> [1] https://github.com/TEIC/Stylesheets/issues/144
>
>
>
>
>> I would like to define an alternation of patterns of up to three
>> attributes, where the datatype of attribute.2 and attribute.3 is uniform
>> and depends on the value of attribute.1.
>>
>> Something like the following (in **pseudo-markup**):
>>
>> <rng:choice>
>>     <rng:group>  "pointer", data.pointer,
>> data.pointer </rng:group>
>>     <rng:group>  "character",   data.nonNegativeInteger,
>> data.nonNegativeInteger </rng:group>
>>     <rng:group>  "byte_offset", data.nonNegativeInteger,
>> data.nonNegativeInteger </rng:group>
>>     <rng:group>  "time_in_sec", data.integer,
>> data.integer </rng:group>
>>    .... etc.
>> </rng:choice>
>>
>> I could do that in RNG, but can I do that in ODD as well?
>>
>> If I understand correctly, ODD expects me to list the three attDef
>> declarations in an <attList> for my new class, and to declare a new data
>> type that groups the data types that I need for attribute.2 and
>> attribute.3, and to make the value of attribute.1 a closed list, and
>> then to slam a huuuge piece of Schematron onto this (hopefully inside
>> this very class definition; I haven't checked that) that would attempt
>> to rule out unwanted sequences.
>>
>> Is my diagnosis correct? I probably wouldn't mind being wrong.
>>
>> I also admit to having a rather hazy idea of the extent of the
>> difference in "staticness" between listing potential patterns in an RNG
>> schema on the one hand, and defining them in the ODD on the other. I
>> mean, I am not really sure if I could <alternate> a series of
>> <sequence>s containing <attDef>s. Would I end up defining a single
>> attribute several times? Would/Should ODD allow me to do that?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: can pure ODD define an alternation of attribute patterns?

Syd Bauman-10
Quick, perhaps incomplete response --

Piotr,

I often run afoul of the fact that most strings are valid as a URI.
E.g., all of the following are valid xsd:anyURIs, as far as I know:
 http://www.example.com/
 htp://www.example.com/
 htttp:/www.example.com/
 why:is:this_bloody_thing_a_(valid)_URI?

If I really want a test string to fail against xsd:anURI, I put in
'%' signs without hex digits following. E.g., "a%bad%URI" should
fail.

But inded, for this reason, you may very well want to compare against
a regex than use `castable as xsd:anyURI`.


> One more little thing, about the xsd:anyURI check: I can't get it
> to work, and the problem appears to only concern xsd:anyURI,
> whether I use (1) to go via the TEI layer or (2) directly.
>
> (1) <dataRef key="teidata.pointer"/>
>
> (2) <dataRef name="anyURI"/>
>
> If I do, e.g.
>
> <sch:assert test="@my_attribute castable as xsd:anyURI">,
>
> I can put anything into the attribute value, and it won't get flagged by
> Schematron. I have done these checks with other data types, directly or
> indirectly, and they all worked (so it's not that I use wrong syntax or
> anything of that sort), but the behaviour of anyURI is different.
>
> Is there some insider info anyone would care to share on this, please?
>
> I looked for other examples of that within TEI/P5 and in Stylesheets,
> but wasn't able to find any.
>
> While composing this message, I found the following passage in the W3C
> spec on XSD datatypes, and I hope I interpret it wrongly when I think
> that this may be the issue:
>
> "Because it is impractical for processors to check that a value is a
> context-appropriate URI reference, this specification follows the lead
> of [RFC 2396] (as amended by [RFC 2732]) in this matter: such rules and
> restrictions are not part of type validity and are not checked by
> ·minimally conforming· processors. Thus in practice the above definition
> imposes only very modest obligations on ·minimally conforming· processors."
>
> https://www.w3.org/TR/xmlschema-2/#anyURI
>
> Would be correct to assume that we're looking at a "minimally
> conforming" behaviour here? My checks were tested in oXygen, but also
> during the TEI/P5 build process, which also validates with Schematron.
>
> And a practical question: should I rather validate against a regex than
> use `castable as xsd:anyURI`?
Loading...