how to remove numbered divs properly?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

how to remove numbered divs properly?

ron.vandenbranden
Administrator
I'm having problems when validating my files against a DTD / Relax NG
schema out of which I tried to remove numbered divs. I did this with
Roma (online), by excluding the elements <div0>, <div1>, <div2>, <div3>,
<div4>, <div5>, <div6> and <div7> from the teistructure module.

In my DTD, this generates ambiguous content models for following elements:

<!ELEMENT front ((%model.frontPart; |
%model.global;)*,(((%model.pLike.front;),(%model.pLike.front; |
titlePage | %model.global;)*) | (div,(div | %model.frontPart; |
%model.global;)*) | (%model.frontPart; | %model.global;)*)?)>
<!ELEMENT back ((%model.frontPart; | %model.global; |
%model.divWrapper;)*,((div,(div | %model.frontPart; | %model.global;)*)
| (%model.frontPart; | %model.global;)*)?,(%model.divWrapper.bottom;)*)>
<!ELEMENT body ((%model.divWrapper; |
%model.global;)*,((((%macro.component;),(%model.global;)*)+,((divGen,(%model.global;)*)*,((div,(div
| divGen | %model.global;)*) | (divGen | %model.global;)* | (divGen |
%model.global;)*)?)) | ((divGen,(%model.global;)*)*,((div,(div | divGen
| %model.global;)*) | (divGen | %model.global;)* | (divGen |
%model.global;)*))),(%model.divWrapper.bottom;)*)>

In lack of time, the quickest solution seemed copying in those element
declarations into my generated DTD from the teilite.dtd (which does not
have numbered divs either) in the tei-exemplars package. Apart from the
fact that the TEILite definition for <front> does not seem to allow a
<front> containing only a <titlePage> (had no time to investigate that
in more detail), I wonder if there is a more elegant way to remove
numbered divs without producing ambiguous content models at all?

Ron Van den Branden

Reply | Threaded
Open this post in threaded view
|

Re: how to remove numbered divs properly?

Laurent Romary-2
Hi Ron,
It does work with me, both with DTD and RelexNg output. I started on
Roma with:
[Build schema (Create a new customisation by adding elements and
modules to the smallest recommended schema)]
and deleted all the (awful ;-)) numbered divs. The ODD file looks as
follows:

<?xml version="1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="en">
     <teiHeader>
         <fileDesc>
             <titleStmt>
                 <title>My TEI Extension</title>
                 <author>generated by Roma 2.7</author>
             </titleStmt>
             <publicationStmt>
                 <p>for use by whoever wants it</p>
             </publicationStmt>
             <sourceDesc>
                 <p>created on Friday 15th December 2006 01:26:23 PM
by the form at
                     http://www.tei-c.org.uk/Roma/</p>
             </sourceDesc>
         </fileDesc>
     </teiHeader>
     <text>
         <front>
             <divGen type="toc"/>
         </front>
         <body>
             <p>My TEI Customization starts with modules tei, core,
header, and textstructure</p>
             <schemaSpec ident="myTei" xml:lang="en">
                 <moduleRef key="core"/>
                 <moduleRef key="tei"/>
                 <moduleRef key="header"/>
                 <moduleRef key="textstructure"/>
                 <elementSpec module="textstructure" ident="div0"
mode="delete"/>
                 <elementSpec module="textstructure" ident="div1"
mode="delete"/>
                 <elementSpec module="textstructure" ident="div2"
mode="delete"/>
                 <elementSpec module="textstructure" ident="div3"
mode="delete"/>
                 <elementSpec module="textstructure" ident="div4"
mode="delete"/>
                 <elementSpec module="textstructure" ident="div5"
mode="delete"/>
                 <elementSpec module="textstructure" ident="div6"
mode="delete"/>
                 <elementSpec module="textstructure" ident="div7"
mode="delete"/>
             </schemaSpec>
         </body>
     </text>
</TEI>


Le 15 déc. 06 à 14:14, Ron Van den Branden a écrit :

> I'm having problems when validating my files against a DTD / Relax
> NG schema out of which I tried to remove numbered divs. I did this
> with Roma (online), by excluding the elements <div0>, <div1>,
> <div2>, <div3>, <div4>, <div5>, <div6> and <div7> from the
> teistructure module.
>
> In my DTD, this generates ambiguous content models for following
> elements:
>
> <!ELEMENT front ((%model.frontPart; | %model.global;)*,(((%
> model.pLike.front;),(%model.pLike.front; | titlePage | %
> model.global;)*) | (div,(div | %model.frontPart; | %model.global;)
> *) | (%model.frontPart; | %model.global;)*)?)>
> <!ELEMENT back ((%model.frontPart; | %model.global; | %
> model.divWrapper;)*,((div,(div | %model.frontPart; | %model.global;)
> *) | (%model.frontPart; | %model.global;)*)?,(%
> model.divWrapper.bottom;)*)>
> <!ELEMENT body ((%model.divWrapper; | %model.global;)*,((((%
> macro.component;),(%model.global;)*)+,((divGen,(%model.global;)*)*,
> ((div,(div | divGen | %model.global;)*) | (divGen | %model.global;)
> * | (divGen | %model.global;)*)?)) | ((divGen,(%model.global;)*)*,
> ((div,(div | divGen | %model.global;)*) | (divGen | %model.global;)
> * | (divGen | %model.global;)*))),(%model.divWrapper.bottom;)*)>
>
> In lack of time, the quickest solution seemed copying in those
> element declarations into my generated DTD from the teilite.dtd
> (which does not have numbered divs either) in the tei-exemplars
> package. Apart from the fact that the TEILite definition for
> <front> does not seem to allow a <front> containing only a
> <titlePage> (had no time to investigate that in more detail), I
> wonder if there is a more elegant way to remove numbered divs
> without producing ambiguous content models at all?
>
> Ron Van den Branden

Reply | Threaded
Open this post in threaded view
|

Re: how to remove numbered divs properly?

ron.vandenbranden
Administrator
In reply to this post by ron.vandenbranden
Thanks, Laurent,

When I generate a DTD from your ODD file, that contains ambiguous
element declarations for <body>, <back> and <front>.

At least, that is what my "xmllint --noout --valid test.xml" commmand
produces:
test.xml:15: validity error : Content model of front is not determinist:
((divGen | titlePage | index | milestone | pb | lb | cb | gap)* ,
(((head | byline | argument | epigraph | docTitle | titlePart |
docAuthor | docEdition | docImprint |  docDate) , (head | byline |
argument | epigraph | docTitle | titlePart | docAuthor | docEdition |
docImprint | docDate | titlePage | index | milestone | pb | lb | cb |
gap)*) | (div , (div | divGen | titlePage | index | milestone | pb | lb
| cb | gap)*) | (divGen | titlePage | index | milestone | pb | lb | cb |
gap)*)?)
 </front>

...when validating this exemplary title page (copied from
http://www.tei-c.org/release/doc/tei-p5-doc/html/DS.html#DSTITL)
<!DOCTYPE front SYSTEM "myTei.dtd">
 <front>
  <titlePage>
   <docTitle>
    <titlePart type="main">Is There a Text in This Class?</titlePart>
    <titlePart type="sub">The Authority of Interpretive
Communities</titlePart>
   </docTitle>
   <docAuthor>Stanley Fish</docAuthor>
   <docImprint>
    <publisher>Harvard University Press</publisher>
    <pubPlace>Cambridge, Massachusetts</pubPlace>
    <pubPlace>London, England</pubPlace>
   </docImprint>
  </titlePage>
 </front>


Laurent Romary schreef:
> Hi Ron,
> It does work with me, both with DTD and RelexNg output. I started on
> Roma with:
So you don't get those errors?

Cheers,

Ron

Reply | Threaded
Open this post in threaded view
|

Re: how to remove numbered divs properly?

Laurent Romary-2
In reply to this post by ron.vandenbranden
Then it has to do with the differences in the way the various tools
work: my Oxygen works perfectly well with the DTD.
Best,
Laurent

Le 15 déc. 06 à 14:53, Ron Van den Branden a écrit :

> Thanks, Laurent,
>
> When I generate a DTD from your ODD file, that contains ambiguous
> element declarations for <body>, <back> and <front>.
>
> At least, that is what my "xmllint --noout --valid test.xml"
> commmand produces:
> test.xml:15: validity error : Content model of front is not
> determinist: ((divGen | titlePage | index | milestone | pb | lb |
> cb | gap)* , (((head | byline | argument | epigraph | docTitle |
> titlePart | docAuthor | docEdition | docImprint |  docDate) , (head
> | byline | argument | epigraph | docTitle | titlePart | docAuthor |
> docEdition | docImprint | docDate | titlePage | index | milestone |
> pb | lb | cb | gap)*) | (div , (div | divGen | titlePage | index |
> milestone | pb | lb | cb | gap)*) | (divGen | titlePage | index |
> milestone | pb | lb | cb | gap)*)?)
> </front>
>
> ...when validating this exemplary title page (copied from http://
> www.tei-c.org/release/doc/tei-p5-doc/html/DS.html#DSTITL)
> <!DOCTYPE front SYSTEM "myTei.dtd">
> <front>
>  <titlePage>
>   <docTitle>
>    <titlePart type="main">Is There a Text in This Class?</titlePart>
>    <titlePart type="sub">The Authority of Interpretive Communities</
> titlePart>
>   </docTitle>
>   <docAuthor>Stanley Fish</docAuthor>
>   <docImprint>
>    <publisher>Harvard University Press</publisher>
>    <pubPlace>Cambridge, Massachusetts</pubPlace>
>    <pubPlace>London, England</pubPlace>
>   </docImprint>
>  </titlePage>
> </front>
>
>
> Laurent Romary schreef:
>> Hi Ron,
>> It does work with me, both with DTD and RelexNg output. I started
>> on Roma with:
> So you don't get those errors?
>
> Cheers,
>
> Ron

Reply | Threaded
Open this post in threaded view
|

Re: how to remove numbered divs properly?

Lou Burnard-5
In reply to this post by ron.vandenbranden
In message <[hidden email]> Laurent Romary
<[hidden email]> writes:
> Hi Ron,
> It does work with me, both with DTD and RelexNg output. I started on
...

Yes, it works for me too.

> > In lack of time, the quickest solution seemed copying in those
> > element declarations into my generated DTD from the teilite.dtd
> > (which does not have numbered divs either) in the tei-exemplars
> > package.

Exactly what I would suggest -- but if you look at the ODD for TEI Lite, you'll
see that it doesn't do anything special to front or back anyway, so you should
be getting the same result.


>Apart from the fact that the TEILite definition for
> > <front> does not seem to allow a <front> containing only a
> > <titlePage> (had no time to investigate that in more detail),


As aforesaid, TEILite doesnt change the definition for front at all, so if
that's true, it has always been the case.

 I
> > wonder if there is a more elegant way to remove numbered divs
> > without producing ambiguous content models at all?

Are you suggesting that ODD is inelegant?!

Having banged my head on this wall several times, I can assure you that there is
NO EASY WAY of producing unambiguous content models in TEIland in general. Those
pesky model.globals will always get you in the end.  We have several times tried
to find a satisfactory way of simplifying the content models for e.g. div and
body, but the current P5 state of affairs still needs more attention.

I suspect that in principle this is an intractable question, since any content
model which depends on something being there (e.g. titlepage in the case of
front) to prevent an ambiguity will keel over with a sickening groan as soon as
you delete that something. Consider a model like this

(model.foo*, model.bar+, model.foo*)

This is unambiguous only for as long as model.bar actually has some members.

The current Roma does a great job of detecting and trying to sort out simple
cases like this, but I don't think it can catch everything.

>
> > Ron Van den Branden
>

Reply | Threaded
Open this post in threaded view
|

Re: how to remove numbered divs properly?

Syd Bauman
In reply to this post by ron.vandenbranden
> Yes, it works for me too.

Indeed, oXygen fails to report the ambiguous content model for me,
too. I hope to have time to generate a test file and report this on
oxygen-users tomorrow.

Both xmllint and onsgmls flag the non-deterministic content models.
It is interesting to note the different error message philosophies.
xmllint gives me a maximum 3 error messages, one for each of the 3
problematic content models, but only if the element actually occurs in
the instance (and since <body> is a required element, there's always
at least 1 error). The error message, however, is pretty unhelpful:
xmllint flags the line in the instance where the first occurrence of
the element ends, and gives the message "validity error : Content
model of [element] is not determinist:" followed by the content model
with parameter entity references expanded.

onsgmls, on the other hand, is completely thorough. It flags the
errors in the DTD, and generates an error message for each and every
possible combination of elements that is non-deterministic. E.g.
'content model is ambiguous: when the current token is the 7th
occurrence of "gap", both the 9th and 10th occurrences of "lb" are
possible'. While it's sometimes helpful to get that detail, the TEI
is a big DTD, and you end up with far too many error messages to wade
through. In this case, I get 3,874 error messages for 3 problematic
content models!


> > > In lack of time, the quickest solution seemed copying in those
> > > element declarations into my generated DTD from the teilite.dtd
> > > (which does not have numbered divs either) in the tei-exemplars
> > > package.
>
> Exactly what I would suggest -- but if you look at the ODD for TEI
> Lite, you'll see that it doesn't do anything special to front or
> back anyway, so you should be getting the same result.

I'm not sure whether Ron meant he was copying the element
specifications from the TEI Lite ODD shipped in the exemplars package
(/usr/share/xml/tei/custom/odd/teilite.odd if you're using the Debian
directory structure) into his ODD, or copying the element
declarations from the TEI Lite DTD shipped in the exemplars package
(/usr/share/xml/tei/custom/schema/dtd/teilite.dtd) into his DTD that
was generated from his ODD.

In the former case, I agree completely that this is in general a very
good approach. In the latter case, it seems like a hack that is
probably not a very good idea, and in any case is not likely to be
something Council eventually calls conformant TEI.


> > >Apart from the fact that the TEILite definition for <front> does
> > >not seem to allow a <front> containing only a <titlePage> (had
> > >no time to investigate that in more detail),
>
> As aforesaid, TEILite doesnt change the definition for front at
> all, so if that's true, it has always been the case.

Although this is correct, the current development version of Lite
does not change the specification of <front>, the current release
version does. The release version has exactly the problem Ron is
describing, which Lou fixed on 12-05.


> > > I wonder if there is a more elegant way to remove numbered divs
> > > without producing ambiguous content models at all?
> Having banged my head on this wall several times, I can assure you
> that there is NO EASY WAY of producing unambiguous content models
> in TEIland in general. ...
> (model.foo*, model.bar+, model.foo*)
> This is unambiguous only for as long as model.bar actually has some
> members.
> The current Roma does a great job of detecting and trying to sort
> out simple cases like this, but I don't think it can catch
> everything.

I think Lou is correct, there is no easy way. On the other hand,
there is at least one hack that is not so difficult, and it is
possible to replace the content models of the afflicted elements. The
latter solution has the disadvantage that if the TEI changes the
content model of one (or more) of those elements, you need to rebuild
your content model, too. (Which is a pain.)

I have whipped up a few ODDs for people to see how this is done. They
are not currently well enough written or tested to be considered
exemplary, so rather than being placed in the exemplars package,
I've put them up on the TEI wiki. See the "FAND" pages at
http://www.tei-c.org/wiki/index.php/Category:Customization.

Reply | Threaded
Open this post in threaded view
|

Re: how to remove numbered divs properly?

George Bina
In reply to this post by ron.vandenbranden
Hi Syd,

oXygen uses Xerces J as the default XML parser and Xerces does not check
for ambiguous content models in DTDs but it handles them without
problem. You can also use LIBXML from oXygen, it comes with oXygen by
default and you can invoke it from the Custom Validation combo, just
select LIBXML on a file that uses the DTD, that catches the ambiguous
content model in the DTD.

FWIW here there are the comments made by Tim Bray on this issue:
http://www.xml.com/axml/notes/Determinism.html
***
Deterministic Grammars

This stuff is not worth worrying about. This rule was inherited from
SGML; its inclusion in SGML was actually a design error. This was
retained in XML not only for compatibility with SGML (not quite a good
enough reason; I voted against it) but because some of the most popular
existing SGML tools actually rely on it for certain internal optimizations.

It's likely that quite a few XML products will never bother checking for
violations of this rule, because it's hard; if you're writing a DTD and
you get a complaint about a nondeterministic content model, then you
might find it worthwhile to read the appendix.
***

Best Regards,
George
---------------------------------------------------------------------
George Cristian Bina
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com


Syd Bauman wrote:

>> Yes, it works for me too.
>
> Indeed, oXygen fails to report the ambiguous content model for me,
> too. I hope to have time to generate a test file and report this on
> oxygen-users tomorrow.
>
> Both xmllint and onsgmls flag the non-deterministic content models.
> It is interesting to note the different error message philosophies.
> xmllint gives me a maximum 3 error messages, one for each of the 3
> problematic content models, but only if the element actually occurs in
> the instance (and since <body> is a required element, there's always
> at least 1 error). The error message, however, is pretty unhelpful:
> xmllint flags the line in the instance where the first occurrence of
> the element ends, and gives the message "validity error : Content
> model of [element] is not determinist:" followed by the content model
> with parameter entity references expanded.
>
> onsgmls, on the other hand, is completely thorough. It flags the
> errors in the DTD, and generates an error message for each and every
> possible combination of elements that is non-deterministic. E.g.
> 'content model is ambiguous: when the current token is the 7th
> occurrence of "gap", both the 9th and 10th occurrences of "lb" are
> possible'. While it's sometimes helpful to get that detail, the TEI
> is a big DTD, and you end up with far too many error messages to wade
> through. In this case, I get 3,874 error messages for 3 problematic
> content models!
>
>
>>>> In lack of time, the quickest solution seemed copying in those
>>>> element declarations into my generated DTD from the teilite.dtd
>>>> (which does not have numbered divs either) in the tei-exemplars
>>>> package.
>> Exactly what I would suggest -- but if you look at the ODD for TEI
>> Lite, you'll see that it doesn't do anything special to front or
>> back anyway, so you should be getting the same result.
>
> I'm not sure whether Ron meant he was copying the element
> specifications from the TEI Lite ODD shipped in the exemplars package
> (/usr/share/xml/tei/custom/odd/teilite.odd if you're using the Debian
> directory structure) into his ODD, or copying the element
> declarations from the TEI Lite DTD shipped in the exemplars package
> (/usr/share/xml/tei/custom/schema/dtd/teilite.dtd) into his DTD that
> was generated from his ODD.
>
> In the former case, I agree completely that this is in general a very
> good approach. In the latter case, it seems like a hack that is
> probably not a very good idea, and in any case is not likely to be
> something Council eventually calls conformant TEI.
>
>
>>>> Apart from the fact that the TEILite definition for <front> does
>>>> not seem to allow a <front> containing only a <titlePage> (had
>>>> no time to investigate that in more detail),
>> As aforesaid, TEILite doesnt change the definition for front at
>> all, so if that's true, it has always been the case.
>
> Although this is correct, the current development version of Lite
> does not change the specification of <front>, the current release
> version does. The release version has exactly the problem Ron is
> describing, which Lou fixed on 12-05.
>
>
>>>> I wonder if there is a more elegant way to remove numbered divs
>>>> without producing ambiguous content models at all?
>> Having banged my head on this wall several times, I can assure you
>> that there is NO EASY WAY of producing unambiguous content models
>> in TEIland in general. ...
>> (model.foo*, model.bar+, model.foo*)
>> This is unambiguous only for as long as model.bar actually has some
>> members.
>> The current Roma does a great job of detecting and trying to sort
>> out simple cases like this, but I don't think it can catch
>> everything.
>
> I think Lou is correct, there is no easy way. On the other hand,
> there is at least one hack that is not so difficult, and it is
> possible to replace the content models of the afflicted elements. The
> latter solution has the disadvantage that if the TEI changes the
> content model of one (or more) of those elements, you need to rebuild
> your content model, too. (Which is a pain.)
>
> I have whipped up a few ODDs for people to see how this is done. They
> are not currently well enough written or tested to be considered
> exemplary, so rather than being placed in the exemplars package,
> I've put them up on the TEI wiki. See the "FAND" pages at
> http://www.tei-c.org/wiki/index.php/Category:Customization.

Reply | Threaded
Open this post in threaded view
|

Re: how to remove numbered divs properly?

Syd Bauman
In reply to this post by ron.vandenbranden
> oXygen uses Xerces J as the default XML parser and Xerces does not
> check for ambiguous content models in DTDs but it handles them
> without problem. You can also use LIBXML from oXygen, it comes with
> oXygen by default and you can invoke it from the Custom Validation
> combo, just select LIBXML on a file that uses the DTD, that catches
> the ambiguous content model in the DTD.

Thank you for the information, George. (Guess I don't have to post to
oxygen-users :-)


> FWIW here there are the comments made by Tim Bray on this issue:
> [Summary: it's a silly rule.]

Yup. I completely understand (and at least partially sympathize with)
the political reasons why the rule was imposed. And it is my
understanding that matching against deterministic grammars can be
performed significantly faster than against non-deterministic
grammars. And Xerces J is evidence that Mr. Bray was right: some
processors won't bother to check -- the spec does not require that
they check. But nonetheless, the spec is quite clear: "it is an error
if an element in the document can match more than one occurrence of
an element type in the content model".

I don't like it, and that one clause is one of the major reasons I am
pretty anti-DTD. But there it is. Sigh.