Where to put <pb/>

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Where to put <pb/>

Martin Mueller

If  a <div> ends on one page and the next <div> begins on the next page, does it matter whether I put the <pb/> element between the divs or make it the first child of the next div? Both ways would put mark the page break as a beginning, but one could (and perhaps should)  mark the <pb/> element as preceding the next div.   But does it matter?

Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

James Cummings-5
Hi Martin,

For me, if no aspect of the division holds over from one page to the other I would always put the <pb/> (page beginning) at the highest level in the hierarchy possible that still makes sense. So in this case I'd put it in between the two divisions. 
</div> <pb/> <div>  
It does of course depend on what you see as marking the creation of a division, but once there is something that signals it (or fits whatever non-renditional criteria you are using to form divisions whether present in the original document or not) then the division has truly started.

Many thanks,
James 
--
Dr James Cummings, [hidden email]
Senior Lecturer in Late-Medieval Literature and Digital Humanities
School of English, Newcastle University


From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Martin Mueller <[hidden email]>
Sent: 22 October 2020 16:08
To: [hidden email] <[hidden email]>
Subject: Where to put <pb/>
 

⚠ External sender. Take care when opening links or attachments. Do not provide your login details.

If  a <div> ends on one page and the next <div> begins on the next page, does it matter whether I put the <pb/> element between the divs or make it the first child of the next div? Both ways would put mark the page break as a beginning, but one could (and perhaps should)  mark the <pb/> element as preceding the next div.   But does it matter?

Reply | Threaded
Open this post in threaded view
|

AW: Where to put <pb/>

Grüntgens, Max

Dear Martin Mueller,
I would second James Cummings answer.

Additionally, I would argue your stated markup decision concerning PB – and every markup decision considered valid in regard to the/a schema in general – matters only in regard to your own research questions (and the operationalization of specific elements for further analysis), your way of/general outlook on data modeling, and your plan on how to process and on how to subsequently output your marked up texts in HTML or in print.

In my opinion the one important thing one should aim for is consistency within one project.

And, one may as well always relocate all/one PB during further processing steps.

Best regards

Max



Von: TEI (Text Encoding Initiative) public discussion list <[hidden email]> im Auftrag von James Cummings <[hidden email]>
Gesendet: Donnerstag, 22. Oktober 2020 17:18:02
An: [hidden email]
Betreff: Re: Where to put <pb/>
 
Hi Martin,

For me, if no aspect of the division holds over from one page to the other I would always put the <pb/> (page beginning) at the highest level in the hierarchy possible that still makes sense. So in this case I'd put it in between the two divisions. 
</div> <pb/> <div>  
It does of course depend on what you see as marking the creation of a division, but once there is something that signals it (or fits whatever non-renditional criteria you are using to form divisions whether present in the original document or not) then the division has truly started.

Many thanks,
James 
--
Dr James Cummings, [hidden email]
Senior Lecturer in Late-Medieval Literature and Digital Humanities
School of English, Newcastle University


From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Martin Mueller <[hidden email]>
Sent: 22 October 2020 16:08
To: [hidden email] <[hidden email]>
Subject: Where to put <pb/>
 

⚠ External sender. Take care when opening links or attachments. Do not provide your login details.

If  a <div> ends on one page and the next <div> begins on the next page, does it matter whether I put the <pb/> element between the divs or make it the first child of the next div? Both ways would put mark the page break as a beginning, but one could (and perhaps should)  mark the <pb/> element as preceding the next div.   But does it matter?

Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

Paul Schaffner
As you know, Martin, our institutional preference for
making <pb/>s children of the <div> (i.e., 'tucking them
inside the div starting at the beginning of that page')
was always a practical one, not an intellectual one. It
made it far easier for extracted divs (extracted, say, for
serving and display, div by div) to 'carry their page information
with them.' In fact, given our non-XML-based retrieval system,
placing the <pb/>s between the divs effectively made them
invisible and unretrievable.

pfs

On Thu, Oct 22, 2020, at 11:19, Grüntgens, Max wrote:

>  
> Dear Martin Mueller,
> I would second James Cummings answer.
>
> Additionally, I would argue your stated markup decision concerning PB –
> and every markup decision considered valid in regard to the/a schema in
> general – matters only in regard to your own research questions (and
> the operationalization of specific elements for further analysis), your
> way of/general outlook on data modeling, and your plan on how to
> process and on how to subsequently output your marked up texts in HTML
> or in print.
>
> In my opinion the one important thing one should aim for is consistency
> within one project.
>
> And, one may as well always relocate all/one PB during further processing steps.
>
> Best regards
>
> Max
>
>
>
> *Von:* TEI (Text Encoding Initiative) public discussion list
> <[hidden email]> im Auftrag von James Cummings
> <[hidden email]>
> *Gesendet:* Donnerstag, 22. Oktober 2020 17:18:02
> *An:* [hidden email]
> *Betreff:* Re: Where to put <pb/>
>  
> Hi Martin,
>
> For me, if no aspect of the division holds over from one page to the
> other I would always put the <pb/> (page beginning) at the highest
> level in the hierarchy possible that still makes sense. So in this case
> I'd put it in between the two divisions.
> </div> <pb/> <div>  
> It does of course depend on what you see as marking the creation of a
> division, but once there is something that signals it (or fits whatever
> non-renditional criteria you are using to form divisions whether
> present in the original document or not) then the division has truly
> started.
>
>
> Many thanks,
> James
> --
> Dr James Cummings, [hidden email]
> Senior Lecturer in Late-Medieval Literature and Digital Humanities
> School of English, Newcastle University
> Office Hours: https://jamescummings.youcanbook.me/ 
>
> *From:* TEI (Text Encoding Initiative) public discussion list
> <[hidden email]> on behalf of Martin Mueller
> <[hidden email]>
> *Sent:* 22 October 2020 16:08
> *To:* [hidden email] <[hidden email]>
> *Subject:* Where to put <pb/>
>  
> ⚠ External sender. Take care when opening links or attachments. Do not
> provide your login details.
>
> If  a <div> ends on one page and the next <div> begins on the next
> page, does it matter whether I put the <pb/> element between the divs
> or make it the first child of the next div? Both ways would put mark
> the page break as a beginning, but one could (and perhaps should)  mark
> the <pb/> element as preceding the next div.   But does it matter?
>

--
Paul Schaffner  Digital Content & Collections
University of Michigan Libraries
[hidden email] | http://www.umich.edu/~pfs/
Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

Peter Boot-4
In reply to this post by Martin Mueller

Hi Martin,


There is a very practical reason for James' preference. Suppose 


- you put the <pb/> as the first element of the second <div> and 

- your rendering generates a header (perhaps in the margin) when a new div starts, 

- but you also start a new html page for each new <pb> 


(all quite natural in a manuscript-oriented rendering)


then your div header will end up on the previous page. 


You can work around this of course, but then you have to ask whether there are non-whitespace text nodes between the <pb> and the opening tag of the <div>, and other un-xslt-like things. You can, but it is something that you'd rather avoid. 


Best,

Peter


(I just read Paul's answer, which argues out of practical reasons for the opposite solution. But then he doesn't want page-based rendering).



Van: TEI (Text Encoding Initiative) public discussion list <[hidden email]> namens Martin Mueller <[hidden email]>
Verzonden: donderdag 22 oktober 2020 17:08
Aan: [hidden email]
Onderwerp: [TEI-L] Where to put <pb/>
 

If  a <div> ends on one page and the next <div> begins on the next page, does it matter whether I put the <pb/> element between the divs or make it the first child of the next div? Both ways would put mark the page break as a beginning, but one could (and perhaps should)  mark the <pb/> element as preceding the next div.   But does it matter?

Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

James Cummings-5
Hi Peter,

Yes, that was just the kind of issue I was expecting. Paul's version is based on the processing he has in place, so pragmatically of course it makes sense to cater to it instead. One can of course move from a div-based encoding to a page-by-page based encoding using an XSLT stylesheet like https://github.com/TEIC/Stylesheets/blob/dev/tools/processpb.xsl.   

Many thanks,
James 
--
Dr James Cummings, [hidden email]
Senior Lecturer in Late-Medieval Literature and Digital Humanities
School of English, Newcastle University


From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Peter Boot <[hidden email]>
Sent: 22 October 2020 16:34
To: [hidden email] <[hidden email]>
Subject: Re: Where to put <pb/>
 

⚠ External sender. Take care when opening links or attachments. Do not provide your login details.

Hi Martin,


There is a very practical reason for James' preference. Suppose 


- you put the <pb/> as the first element of the second <div> and 

- your rendering generates a header (perhaps in the margin) when a new div starts, 

- but you also start a new html page for each new <pb> 


(all quite natural in a manuscript-oriented rendering)


then your div header will end up on the previous page. 


You can work around this of course, but then you have to ask whether there are non-whitespace text nodes between the <pb> and the opening tag of the <div>, and other un-xslt-like things. You can, but it is something that you'd rather avoid. 


Best,

Peter


(I just read Paul's answer, which argues out of practical reasons for the opposite solution. But then he doesn't want page-based rendering).



Van: TEI (Text Encoding Initiative) public discussion list <[hidden email]> namens Martin Mueller <[hidden email]>
Verzonden: donderdag 22 oktober 2020 17:08
Aan: [hidden email]
Onderwerp: [TEI-L] Where to put <pb/>
 

If  a <div> ends on one page and the next <div> begins on the next page, does it matter whether I put the <pb/> element between the divs or make it the first child of the next div? Both ways would put mark the page break as a beginning, but one could (and perhaps should)  mark the <pb/> element as preceding the next div.   But does it matter?

Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

Andreas Wagner
In reply to this post by Paul Schaffner
Dear list,

for much the same practical reasons that have already been mentioned, we
also "tuck pbs inside divs". If this is of interest, in order to have a
consistent and unequivocal rule of operation, we have stated:

"In case a page, column, or line beginning occurs in conjunction with a
respectively encoded "conceptual" text element (e.g., head, p, div,
note, list, item, titlePage, titlePart, and others), the break element
(pb, cb, lb) is positioned as child of the first mixed-content element
occurring in the structure of this conjunction. In the event of more
than one break element (also including "anchor" elements such as
milestone) occurring conjunctly, the following order applies: pb, cb,
lb, milestone, other elements."

(https://salamanca.school/en/guidelines.html#breaks)


Best,
Andreas


--
Dr. Andreas Wagner                          twitter: @anwagnerdreas
Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
and Institute of Philosophy                 fax: +49 (0)69/798-32794
Goethe University Frankfurt

IGF HP 25 / R 2.455
Norbert-Wollheim-Platz 1
60629 Frankfurt am Main
Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

Robinson, Peter
I’m very much with James’ formulation: put the <pb/>element as high up the hierarchy as you can.
There is even a theoretical basis for this, in the definition of a text as “an act of communication inscribed in a document”. Which means that as you  cannot have a text OUTSIDE a document, then the <pb> representing  the document must precede (and hence “enclose”) the act of communication held by the document.
I seem to recall having this discussion before.
Peter

> On Oct 22, 2020, at 10:40 AM, Andreas Wagner <[hidden email]> wrote:
>
> CAUTION: External to USask. Verify sender and use caution with links and attachments. Forward suspicious emails to [hidden email]
>
>
> Dear list,
>
> for much the same practical reasons that have already been mentioned, we
> also "tuck pbs inside divs". If this is of interest, in order to have a
> consistent and unequivocal rule of operation, we have stated:
>
> "In case a page, column, or line beginning occurs in conjunction with a
> respectively encoded "conceptual" text element (e.g., head, p, div,
> note, list, item, titlePage, titlePart, and others), the break element
> (pb, cb, lb) is positioned as child of the first mixed-content element
> occurring in the structure of this conjunction. In the event of more
> than one break element (also including "anchor" elements such as
> milestone) occurring conjunctly, the following order applies: pb, cb,
> lb, milestone, other elements."
>
> (https://salamanca.school/en/guidelines.html#breaks)
>
>
> Best,
> Andreas
>
>
> --
> Dr. Andreas Wagner                          twitter: @anwagnerdreas
> Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
> Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
> and Institute of Philosophy                 fax: +49 (0)69/798-32794
> Goethe University Frankfurt
>
> IGF HP 25 / R 2.455
> Norbert-Wollheim-Platz 1
> 60629 Frankfurt am Main

Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

Paul Schaffner
In reply to this post by Andreas Wagner
In the interest of sharing, here are ours (at least for this one project):

'Placement of <pb/> tags. The rules are: (1) "pages always break at the top"; that is, <pb/> tags will be inserted in the text at the actual location of the page break (i.e. usually the "top" of the page, but in any case the point at which a reader turns to this page), regardless of the location of the page number on the printed page. (2) "Divisions begin at page breaks; they don't end there"; that is, if a structural break of some kind coincides with the page break (e.g., if a new section (<div>), paragraph, stanza, etc., begins at the head of the new page, the <pb/> tag should be tucked inside the opening tag for the first structural element, neither inside the closing tag for the old division nor between the two divisions. And (3) "Words cannot break at page breaks"; that is, if a hyphenated word straddles a page break, finish the word and any attached punctuation, then insert the <pb/> tag. Treat the hyphen as any other end-of-line hyphen.
Note: (4) in parallel texts, material on a single page is often recorded at widely separated points in the data stream (once in each parallel <div>). In that case, the <pb/> tag, including the page number, should be repeated, i.e., recorded in both <div>s.'

Rule 1 is probably standard, rule 2 controversial, rule 3 perverse, and rule 4 perhaps unexpected.

pfs

On Thu, Oct 22, 2020, at 12:40, Andreas Wagner wrote:

> Dear list,
>
> for much the same practical reasons that have already been mentioned, we
> also "tuck pbs inside divs". If this is of interest, in order to have a
> consistent and unequivocal rule of operation, we have stated:
>
> "In case a page, column, or line beginning occurs in conjunction with a
> respectively encoded "conceptual" text element (e.g., head, p, div,
> note, list, item, titlePage, titlePart, and others), the break element
> (pb, cb, lb) is positioned as child of the first mixed-content element
> occurring in the structure of this conjunction. In the event of more
> than one break element (also including "anchor" elements such as
> milestone) occurring conjunctly, the following order applies: pb, cb,
> lb, milestone, other elements."
>
> (https://salamanca.school/en/guidelines.html#breaks)
>
>
> Best,
> Andreas
>
>
> --
> Dr. Andreas Wagner                          twitter: @anwagnerdreas
> Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
> Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
> and Institute of Philosophy                 fax: +49 (0)69/798-32794
> Goethe University Frankfurt
>
> IGF HP 25 / R 2.455
> Norbert-Wollheim-Platz 1
> 60629 Frankfurt am Main
>

--
Paul Schaffner  Digital Content & Collections
University of Michigan Libraries
[hidden email] | http://www.umich.edu/~pfs/
Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

Mylonas, Elli
Hi all - I am very much in agreement with the posters who want to place the <pb/> outside the elements that it separates. It's cleaner, it's a better representation of the document structure. But it can cause problems with the rendering, especially since page breaks may occur between <div>s and also inside them. It's not difficult if you only want a milestone marker ot appear indicating the page break - but it's hard to associate it content and it's also a pain if, as Paul describes, you also need some kind of page-like formatting. 

Very much agree that Paul's rule 3 is perverse, and have total sympathy with it, as well. 

So, I'm in the <pb> as high in the hierarchy as possible camp. 

best, --elli

On Thu, Oct 22, 2020 at 12:55 PM Paul Schaffner <[hidden email]> wrote:
In the interest of sharing, here are ours (at least for this one project):

'Placement of <pb/> tags. The rules are: (1) "pages always break at the top"; that is, <pb/> tags will be inserted in the text at the actual location of the page break (i.e. usually the "top" of the page, but in any case the point at which a reader turns to this page), regardless of the location of the page number on the printed page. (2) "Divisions begin at page breaks; they don't end there"; that is, if a structural break of some kind coincides with the page break (e.g., if a new section (<div>), paragraph, stanza, etc., begins at the head of the new page, the <pb/> tag should be tucked inside the opening tag for the first structural element, neither inside the closing tag for the old division nor between the two divisions. And (3) "Words cannot break at page breaks"; that is, if a hyphenated word straddles a page break, finish the word and any attached punctuation, then insert the <pb/> tag. Treat the hyphen as any other end-of-line hyphen.
Note: (4) in parallel texts, material on a single page is often recorded at widely separated points in the data stream (once in each parallel <div>). In that case, the <pb/> tag, including the page number, should be repeated, i.e., recorded in both <div>s.'

Rule 1 is probably standard, rule 2 controversial, rule 3 perverse, and rule 4 perhaps unexpected.

pfs

On Thu, Oct 22, 2020, at 12:40, Andreas Wagner wrote:
> Dear list,
>
> for much the same practical reasons that have already been mentioned, we
> also "tuck pbs inside divs". If this is of interest, in order to have a
> consistent and unequivocal rule of operation, we have stated:
>
> "In case a page, column, or line beginning occurs in conjunction with a
> respectively encoded "conceptual" text element (e.g., head, p, div,
> note, list, item, titlePage, titlePart, and others), the break element
> (pb, cb, lb) is positioned as child of the first mixed-content element
> occurring in the structure of this conjunction. In the event of more
> than one break element (also including "anchor" elements such as
> milestone) occurring conjunctly, the following order applies: pb, cb,
> lb, milestone, other elements."
>
> (https://salamanca.school/en/guidelines.html#breaks)
>
>
> Best,
> Andreas
>
>
> --
> Dr. Andreas Wagner                          twitter: @anwagnerdreas
> Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
> Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
> and Institute of Philosophy                 fax: +49 (0)69/798-32794
> Goethe University Frankfurt
>
> IGF HP 25 / R 2.455
> Norbert-Wollheim-Platz 1
> 60629 Frankfurt am Main
>

--
Paul Schaffner  Digital Content & Collections
University of Michigan Libraries
[hidden email] | http://www.umich.edu/~pfs/
lou
Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

lou
In reply to this post by Paul Schaffner
FWIW, I don't think rule 3 (re-assemble a word ahead of a <pb/> which interrupts it)  is perverse. It makes very good sense if you are encoding the document rather than its typography: much more true to the text than the "controversial" rule 2 (if page and div boundaries coincide do the div first). And I think rule 4 is not so much unexpected as bonkers.


On Thu, 22 Oct 2020 at 17:55, Paul Schaffner <[hidden email]> wrote:
In the interest of sharing, here are ours (at least for this one project):

'Placement of <pb/> tags. The rules are: (1) "pages always break at the top"; that is, <pb/> tags will be inserted in the text at the actual location of the page break (i.e. usually the "top" of the page, but in any case the point at which a reader turns to this page), regardless of the location of the page number on the printed page. (2) "Divisions begin at page breaks; they don't end there"; that is, if a structural break of some kind coincides with the page break (e.g., if a new section (<div>), paragraph, stanza, etc., begins at the head of the new page, the <pb/> tag should be tucked inside the opening tag for the first structural element, neither inside the closing tag for the old division nor between the two divisions. And (3) "Words cannot break at page breaks"; that is, if a hyphenated word straddles a page break, finish the word and any attached punctuation, then insert the <pb/> tag. Treat the hyphen as any other end-of-line hyphen.
Note: (4) in parallel texts, material on a single page is often recorded at widely separated points in the data stream (once in each parallel <div>). In that case, the <pb/> tag, including the page number, should be repeated, i.e., recorded in both <div>s.'

Rule 1 is probably standard, rule 2 controversial, rule 3 perverse, and rule 4 perhaps unexpected.

pfs

On Thu, Oct 22, 2020, at 12:40, Andreas Wagner wrote:
> Dear list,
>
> for much the same practical reasons that have already been mentioned, we
> also "tuck pbs inside divs". If this is of interest, in order to have a
> consistent and unequivocal rule of operation, we have stated:
>
> "In case a page, column, or line beginning occurs in conjunction with a
> respectively encoded "conceptual" text element (e.g., head, p, div,
> note, list, item, titlePage, titlePart, and others), the break element
> (pb, cb, lb) is positioned as child of the first mixed-content element
> occurring in the structure of this conjunction. In the event of more
> than one break element (also including "anchor" elements such as
> milestone) occurring conjunctly, the following order applies: pb, cb,
> lb, milestone, other elements."
>
> (https://salamanca.school/en/guidelines.html#breaks)
>
>
> Best,
> Andreas
>
>
> --
> Dr. Andreas Wagner                          twitter: @anwagnerdreas
> Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
> Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
> and Institute of Philosophy                 fax: +49 (0)69/798-32794
> Goethe University Frankfurt
>
> IGF HP 25 / R 2.455
> Norbert-Wollheim-Platz 1
> 60629 Frankfurt am Main
>

--
Paul Schaffner  Digital Content & Collections
University of Michigan Libraries
[hidden email] | http://www.umich.edu/~pfs/
Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

Paul Schaffner
Hmm,

Rule 4 is chiefly to accommodate parallel-column texts and the
like, as in this example in which the terms of an oath are
spread across several pages, first in Latin (left column),
then in English (right column), and therefore produce
a text stream that proceeds in sequence (Latin) page1 - page2 -
page3 - (English) page1 - page2 - page3.

  http://www-personal.umich.edu/~pfs/tcp/queries/parallel.jpg

which I would encode roughly thus (using numbered
divs to clarify the hierarchy, and ignoring the
'tuck inside the div' rule for the moment). If
you *don't* repeat the <pb/> when the text
stream returns to the first page, what *do*
you do? (and yes, I realize that you could break
the text at the paragraph level rather than at
the language level in order to keep the pages
entire, but you can't always, and let's say
we don't want to).

<pb n="1"/>
 <div1>
 <head>The Form of the Oaths...</head>
  <div2 type="version">
  <head>In Latin: Formula Juramenti...</head>
    <div3 type="preamble">
      <p>EGo A.B....<p>
    </div3>
<pb n="2"/
    <div3 type="clause" n="1">
      <head>I.</head>
      <p>Quod Navis...</p>
    </div3>
    <div3 type="clause" n="2">
      <head>II.</head>
      <p>Quod revera ... </p>
    </div3>
<pb n="3"/>
    <div3 type="clause" n="3">
      <head>III.</head>
      <p>Quod nullam...</p>
    </div3>
    <div3 type="clause" n="4">
      <head>IV.</head>
      <p>Quod supradicta...</p>
    </div3>
  </div2>
<pb n="1"/>
  <div2 type="version">
  <head>In English. The Form of the Oath...</head>
    <div3 type="preamble">
      <p>I A.B. .. </p>
    </div3>
<pb n="2"/
    <div3 type="clause" n="1">
      <head>I.</head>
      <p>That the Ship...,</p>
    </div3>
    <div3 type="clause" n="2">
      <head>II.</head>
      <p>That I have really...</p>
    </div3>
<pb n="3"/>
    <div3 type="clause" n="3">
      <head>III.</head>
      <p>That I have not...</p>
    </div3>
    <div3 type="clause" n="4">
      <head>IV.</head>
      <p>That the said Ship...</p>
    </div3>
 </div2>
</div1>

pfs

ps I also realize that this is getting beyond Martin's
original question, but the situation is a common one and
probably worth addressing now and again to garner a sense
of what people do.
   


On Thu, Oct 22, 2020, at 13:22, Lou Burnard wrote:

> FWIW, I don't think rule 3 (re-assemble a word ahead of a <pb/> which
> interrupts it)  is perverse. It makes very good sense if you are
> encoding the document rather than its typography: much more true to the
> text than the "controversial" rule 2 (if page and div boundaries
> coincide do the div first). And I think rule 4 is not so much
> unexpected as bonkers.
>
>
> On Thu, 22 Oct 2020 at 17:55, Paul Schaffner <[hidden email]> wrote:
> > In the interest of sharing, here are ours (at least for this one project):
> >
> > 'Placement of <pb/> tags. The rules are: (1) "pages always break at the top"; that is, <pb/> tags will be inserted in the text at the actual location of the page break (i.e. usually the "top" of the page, but in any case the point at which a reader turns to this page), regardless of the location of the page number on the printed page. (2) "Divisions begin at page breaks; they don't end there"; that is, if a structural break of some kind coincides with the page break (e.g., if a new section (<div>), paragraph, stanza, etc., begins at the head of the new page, the <pb/> tag should be tucked inside the opening tag for the first structural element, neither inside the closing tag for the old division nor between the two divisions. And (3) "Words cannot break at page breaks"; that is, if a hyphenated word straddles a page break, finish the word and any attached punctuation, then insert the <pb/> tag. Treat the hyphen as any other end-of-line hyphen.
> > Note: (4) in parallel texts, material on a single page is often recorded at widely separated points in the data stream (once in each parallel <div>). In that case, the <pb/> tag, including the page number, should be repeated, i.e., recorded in both <div>s.'
> >
> > Rule 1 is probably standard, rule 2 controversial, rule 3 perverse, and rule 4 perhaps unexpected.
> >
> > pfs
> >
> > On Thu, Oct 22, 2020, at 12:40, Andreas Wagner wrote:
> > > Dear list,
> > >
> > > for much the same practical reasons that have already been mentioned, we
> > > also "tuck pbs inside divs". If this is of interest, in order to have a
> > > consistent and unequivocal rule of operation, we have stated:
> > >
> > > "In case a page, column, or line beginning occurs in conjunction with a
> > > respectively encoded "conceptual" text element (e.g., head, p, div,
> > > note, list, item, titlePage, titlePart, and others), the break element
> > > (pb, cb, lb) is positioned as child of the first mixed-content element
> > > occurring in the structure of this conjunction. In the event of more
> > > than one break element (also including "anchor" elements such as
> > > milestone) occurring conjunctly, the following order applies: pb, cb,
> > > lb, milestone, other elements."
> > >
> > > (https://salamanca.school/en/guidelines.html#breaks)
> > >
> > >
> > > Best,
> > > Andreas
> > >
> > >
> > > --
> > > Dr. Andreas Wagner                          twitter: @anwagnerdreas
> > > Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
> > > Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
> > > and Institute of Philosophy                 fax: +49 (0)69/798-32794
> > > Goethe University Frankfurt
> > >
> > > IGF HP 25 / R 2.455
> > > Norbert-Wollheim-Platz 1
> > > 60629 Frankfurt am Main
> > >
> >
> > --
> > Paul Schaffner  Digital Content & Collections
> > University of Michigan Libraries
> > [hidden email] | http://www.umich.edu/~pfs/

--
Paul Schaffner  Digital Content & Collections
University of Michigan Libraries
[hidden email] | http://www.umich.edu/~pfs/
Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

Stuart A. Yeates
At the NZETC we have 'adjust' <pb>'s dynamically, as one of the first
steps in almost all processing activities.

The clearest example of the rationale for this is a footnote which is
split across physical pages. The page break occurs twice in the
logical text, once in the body text and once in the footnote text. How
you process that is highly dependent on exactly what you're doing with
the text.

cheers
stuart
--
...let us be heard from red core to black sky

On Fri, 23 Oct 2020 at 07:50, Paul Schaffner <[hidden email]> wrote:

>
> Hmm,
>
> Rule 4 is chiefly to accommodate parallel-column texts and the
> like, as in this example in which the terms of an oath are
> spread across several pages, first in Latin (left column),
> then in English (right column), and therefore produce
> a text stream that proceeds in sequence (Latin) page1 - page2 -
> page3 - (English) page1 - page2 - page3.
>
>   http://www-personal.umich.edu/~pfs/tcp/queries/parallel.jpg
>
> which I would encode roughly thus (using numbered
> divs to clarify the hierarchy, and ignoring the
> 'tuck inside the div' rule for the moment). If
> you *don't* repeat the <pb/> when the text
> stream returns to the first page, what *do*
> you do? (and yes, I realize that you could break
> the text at the paragraph level rather than at
> the language level in order to keep the pages
> entire, but you can't always, and let's say
> we don't want to).
>
> <pb n="1"/>
>  <div1>
>  <head>The Form of the Oaths...</head>
>   <div2 type="version">
>   <head>In Latin: Formula Juramenti...</head>
>     <div3 type="preamble">
>       <p>EGo A.B....<p>
>     </div3>
> <pb n="2"/
>     <div3 type="clause" n="1">
>       <head>I.</head>
>       <p>Quod Navis...</p>
>     </div3>
>     <div3 type="clause" n="2">
>       <head>II.</head>
>       <p>Quod revera ... </p>
>     </div3>
> <pb n="3"/>
>     <div3 type="clause" n="3">
>       <head>III.</head>
>       <p>Quod nullam...</p>
>     </div3>
>     <div3 type="clause" n="4">
>       <head>IV.</head>
>       <p>Quod supradicta...</p>
>     </div3>
>   </div2>
> <pb n="1"/>
>   <div2 type="version">
>   <head>In English. The Form of the Oath...</head>
>     <div3 type="preamble">
>       <p>I A.B. .. </p>
>     </div3>
> <pb n="2"/
>     <div3 type="clause" n="1">
>       <head>I.</head>
>       <p>That the Ship...,</p>
>     </div3>
>     <div3 type="clause" n="2">
>       <head>II.</head>
>       <p>That I have really...</p>
>     </div3>
> <pb n="3"/>
>     <div3 type="clause" n="3">
>       <head>III.</head>
>       <p>That I have not...</p>
>     </div3>
>     <div3 type="clause" n="4">
>       <head>IV.</head>
>       <p>That the said Ship...</p>
>     </div3>
>  </div2>
> </div1>
>
> pfs
>
> ps I also realize that this is getting beyond Martin's
> original question, but the situation is a common one and
> probably worth addressing now and again to garner a sense
> of what people do.
>
>
>
> On Thu, Oct 22, 2020, at 13:22, Lou Burnard wrote:
> > FWIW, I don't think rule 3 (re-assemble a word ahead of a <pb/> which
> > interrupts it)  is perverse. It makes very good sense if you are
> > encoding the document rather than its typography: much more true to the
> > text than the "controversial" rule 2 (if page and div boundaries
> > coincide do the div first). And I think rule 4 is not so much
> > unexpected as bonkers.
> >
> >
> > On Thu, 22 Oct 2020 at 17:55, Paul Schaffner <[hidden email]> wrote:
> > > In the interest of sharing, here are ours (at least for this one project):
> > >
> > > 'Placement of <pb/> tags. The rules are: (1) "pages always break at the top"; that is, <pb/> tags will be inserted in the text at the actual location of the page break (i.e. usually the "top" of the page, but in any case the point at which a reader turns to this page), regardless of the location of the page number on the printed page. (2) "Divisions begin at page breaks; they don't end there"; that is, if a structural break of some kind coincides with the page break (e.g., if a new section (<div>), paragraph, stanza, etc., begins at the head of the new page, the <pb/> tag should be tucked inside the opening tag for the first structural element, neither inside the closing tag for the old division nor between the two divisions. And (3) "Words cannot break at page breaks"; that is, if a hyphenated word straddles a page break, finish the word and any attached punctuation, then insert the <pb/> tag. Treat the hyphen as any other end-of-line hyphen.
> > > Note: (4) in parallel texts, material on a single page is often recorded at widely separated points in the data stream (once in each parallel <div>). In that case, the <pb/> tag, including the page number, should be repeated, i.e., recorded in both <div>s.'
> > >
> > > Rule 1 is probably standard, rule 2 controversial, rule 3 perverse, and rule 4 perhaps unexpected.
> > >
> > > pfs
> > >
> > > On Thu, Oct 22, 2020, at 12:40, Andreas Wagner wrote:
> > > > Dear list,
> > > >
> > > > for much the same practical reasons that have already been mentioned, we
> > > > also "tuck pbs inside divs". If this is of interest, in order to have a
> > > > consistent and unequivocal rule of operation, we have stated:
> > > >
> > > > "In case a page, column, or line beginning occurs in conjunction with a
> > > > respectively encoded "conceptual" text element (e.g., head, p, div,
> > > > note, list, item, titlePage, titlePart, and others), the break element
> > > > (pb, cb, lb) is positioned as child of the first mixed-content element
> > > > occurring in the structure of this conjunction. In the event of more
> > > > than one break element (also including "anchor" elements such as
> > > > milestone) occurring conjunctly, the following order applies: pb, cb,
> > > > lb, milestone, other elements."
> > > >
> > > > (https://salamanca.school/en/guidelines.html#breaks)
> > > >
> > > >
> > > > Best,
> > > > Andreas
> > > >
> > > >
> > > > --
> > > > Dr. Andreas Wagner                          twitter: @anwagnerdreas
> > > > Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
> > > > Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
> > > > and Institute of Philosophy                 fax: +49 (0)69/798-32794
> > > > Goethe University Frankfurt
> > > >
> > > > IGF HP 25 / R 2.455
> > > > Norbert-Wollheim-Platz 1
> > > > 60629 Frankfurt am Main
> > > >
> > >
> > > --
> > > Paul Schaffner  Digital Content & Collections
> > > University of Michigan Libraries
> > > [hidden email] | http://www.umich.edu/~pfs/
>
> --
> Paul Schaffner  Digital Content & Collections
> University of Michigan Libraries
> [hidden email] | http://www.umich.edu/~pfs/
Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

Maarten Janssen
Just to put a more computationally driven spin on this: when rending information related to pages, you have to deal with two things: when rending a part of text, you need to know which page it belongs to, especially when the page is related to a facsimile image; and when displaying a page, you have to be able to render everything that appears on that page; to deal with the first, it is easier if the <pb> appears on a predictable level - so since <pb> can be necessarily inside a <div>, it is easier if they are always inside a <div> also when they begin at the same point (of course, in a framework like TEITOK that cannot assume that people do that, you cannot rely on it, so it is for all intents and purposes the same, and you just have to look back in the XML to the first preceding <pb> element independent of its position in the hierarchy). While for the latter - it would be easier to render the XML of the page if the <div> was after the <pb> - again, of course of only if done systematically, and in principle also only when the <div> would also end on the page, otherwise you get only an XML fragment anyway.

As for avoiding word-breakig <pb> - that does make some sense if you want to render the content of the page - but I would say it is bad practice: it forces an interpretation in which broken words are always flushed to the next page; while in practice, sometimes you want to really have only the actual text of the page, even if it is only half a word; and you might want to change your position and flush broken words to the previous page instead - all of which you can do if you know exactly where the new page starts, and not if you un-break words; of course, that is much easier to do in a tokenized TEI environment like TEITOK since you know whether a <pb> is inside a word or not (and what the full word it is breaking is), but I would say it is still best practice in non-tokenized TEI as well; also since you avoid words across <pb> you probably do the same with words across an <lb> as well, which sort of voids the reason why <pb> and <lb> are empty elements in my opinion.

As for having multiple <pb> in case there is a footnote running across multiple pages - at least in our approach, that would make things more difficult rather than easier since that makes both rendering a page and looking up the image harder since you have to take into account that a single <pb> appears twice - so in those cases, I would always argue for two separate footnotes, linked by a @next (and @previous where needed)
Reply | Threaded
Open this post in threaded view
|

SV: Where to put <pb/>

Sigfrid Lundberg
In reply to this post by Paul Schaffner
My friends and I build a single service (https://tekster.kb.dk -- it is in Danish, sorry) based on multiple corpora with different encoding practices. I have learned a lot by navigating and searching those.

Most of you seem to make the assumption that a digital edition of text is based on one, and only one source (witness or reading). One edition we are about to (re-)publish is Søren Kierkegaard's collected works. The Enten-Eller (Either-Or) in two volumes is based on seven witnesses and hence we have page breaks from all seven in the text, each with a @wit attribute, an idref to the <witList>. We could (if we had bothered to complicate the UI) present search results for each of them or made concordance between all of them.

The page break is the typographer's (or the scribe's) contribution. It isn't in the original manuscript (which will have page breaks on its own, even if it's born in a digital word processor). However, presenting text isn't a slide show, so never use the page as the indivisible atom of a text, the <div>, <lg>, <sp>, <p> are that. There is a fair chance that the author created them. The page break should be an addressable point in thext.

When indexing a generic TEI-lite kind of doc for collections having facsimiles, this how I figure out which <pb/> a given <div>, <lg>, <sp> or <p> belongs to

  <xsl:function name="me:right_kind_of_page">
    <xsl:param name="here"/>
    <xsl:param name="root"/>
    <xsl:choose>
      <xsl:when test="$here/preceding::t:pb[@facs and not(@rend = 'missing')]">
        <xsl:copy-of select="$here/preceding::t:pb[@facs and not(@rend = 'missing')][1]"/>
      </xsl:when>
      <xsl:when test="$here/descendant::t:pb[@facs and not(@rend = 'missing')]">
        <xsl:copy-of select="$here/descendant::t:pb[@facs and not(@rend = 'missing')][1]"/>
      </xsl:when>
      <xsl:otherwise>
      <xsl:value-of select="false()"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

Note that we have to check for presence of facsimiles and if they are missing.

Suppose this is close to Paul's rule (1). I have a version of this function filtering by @wit since all the editions should be accessible in the same way, I opt for the lemma page breaks.

Yours,

Sigfrid



________________________________________
Fra: TEI (Text Encoding Initiative) public discussion list [[hidden email]] på vegne af Paul Schaffner [[hidden email]]
Sendt: 22. oktober 2020 18:54
Til: [hidden email]
Emne: Re: Where to put <pb/>

In the interest of sharing, here are ours (at least for this one project):

'Placement of <pb/> tags. The rules are: (1) "pages always break at the top"; that is, <pb/> tags will be inserted in the text at the actual location of the page break (i.e. usually the "top" of the page, but in any case the point at which a reader turns to this page), regardless of the location of the page number on the printed page. (2) "Divisions begin at page breaks; they don't end there"; that is, if a structural break of some kind coincides with the page break (e.g., if a new section (<div>), paragraph, stanza, etc., begins at the head of the new page, the <pb/> tag should be tucked inside the opening tag for the first structural element, neither inside the closing tag for the old division nor between the two divisions. And (3) "Words cannot break at page breaks"; that is, if a hyphenated word straddles a page break, finish the word and any attached punctuation, then insert the <pb/> tag. Treat the hyphen as any other end-of-line hyphen.
Note: (4) in parallel texts, material on a single page is often recorded at widely separated points in the data stream (once in each parallel <div>). In that case, the <pb/> tag, including the page number, should be repeated, i.e., recorded in both <div>s.'

Rule 1 is probably standard, rule 2 controversial, rule 3 perverse, and rule 4 perhaps unexpected.

pfs

On Thu, Oct 22, 2020, at 12:40, Andreas Wagner wrote:

> Dear list,
>
> for much the same practical reasons that have already been mentioned, we
> also "tuck pbs inside divs". If this is of interest, in order to have a
> consistent and unequivocal rule of operation, we have stated:
>
> "In case a page, column, or line beginning occurs in conjunction with a
> respectively encoded "conceptual" text element (e.g., head, p, div,
> note, list, item, titlePage, titlePart, and others), the break element
> (pb, cb, lb) is positioned as child of the first mixed-content element
> occurring in the structure of this conjunction. In the event of more
> than one break element (also including "anchor" elements such as
> milestone) occurring conjunctly, the following order applies: pb, cb,
> lb, milestone, other elements."
>
> (https://salamanca.school/en/guidelines.html#breaks)
>
>
> Best,
> Andreas
>
>
> --
> Dr. Andreas Wagner                          twitter: @anwagnerdreas
> Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
> Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
> and Institute of Philosophy                 fax: +49 (0)69/798-32794
> Goethe University Frankfurt
>
> IGF HP 25 / R 2.455
> Norbert-Wollheim-Platz 1
> 60629 Frankfurt am Main
>

--
Paul Schaffner  Digital Content & Collections
University of Michigan Libraries
[hidden email] | http://www.umich.edu/~pfs/
Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

Magdalena Turska

In TEI Publisher we distinguish 3 main ways (=views) to navigate through the document

  • single: no subdivisions (entire document or part thereof)
  • div: subdivisions are structural units, div elements in TEI, section elements in DocBook
  • page: subdivisions are reconstructed fragments corresponding to a physical page, so chunks between pb elements in TEI, makes no sense in DocBook


From this perspective, basic logical structure is present in any document and is assumed to be a dominant hierarchy. It is both convenient for processing and easy to formulate an encoding Guideline to say that all milestone elements are nested within the structural markup. It is also not incorrect, I think, to assume that milestone elements (like page break coinciding with a new chapter) are not hanging in some kind of logical void but belong to a logical container. I enlist in the pb-in-a-div camp.


Magdalena


On Fri, 23 Oct 2020 at 09:25, Sigfrid Lundberg <[hidden email]> wrote:
My friends and I build a single service (https://tekster.kb.dk -- it is in Danish, sorry) based on multiple corpora with different encoding practices. I have learned a lot by navigating and searching those.

Most of you seem to make the assumption that a digital edition of text is based on one, and only one source (witness or reading). One edition we are about to (re-)publish is Søren Kierkegaard's collected works. The Enten-Eller (Either-Or) in two volumes is based on seven witnesses and hence we have page breaks from all seven in the text, each with a @wit attribute, an idref to the <witList>. We could (if we had bothered to complicate the UI) present search results for each of them or made concordance between all of them.

The page break is the typographer's (or the scribe's) contribution. It isn't in the original manuscript (which will have page breaks on its own, even if it's born in a digital word processor). However, presenting text isn't a slide show, so never use the page as the indivisible atom of a text, the <div>, <lg>, <sp>, <p> are that. There is a fair chance that the author created them. The page break should be an addressable point in thext.

When indexing a generic TEI-lite kind of doc for collections having facsimiles, this how I figure out which <pb/> a given <div>, <lg>, <sp> or <p> belongs to

  <xsl:function name="me:right_kind_of_page">
    <xsl:param name="here"/>
    <xsl:param name="root"/>
    <xsl:choose>
      <xsl:when test="$here/preceding::t:pb[@facs and not(@rend = 'missing')]">
        <xsl:copy-of select="$here/preceding::t:pb[@facs and not(@rend = 'missing')][1]"/>
      </xsl:when>
      <xsl:when test="$here/descendant::t:pb[@facs and not(@rend = 'missing')]">
        <xsl:copy-of select="$here/descendant::t:pb[@facs and not(@rend = 'missing')][1]"/>
      </xsl:when>
      <xsl:otherwise>
      <xsl:value-of select="false()"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

Note that we have to check for presence of facsimiles and if they are missing.

Suppose this is close to Paul's rule (1). I have a version of this function filtering by @wit since all the editions should be accessible in the same way, I opt for the lemma page breaks.

Yours,

Sigfrid



________________________________________
Fra: TEI (Text Encoding Initiative) public discussion list [[hidden email]] på vegne af Paul Schaffner [[hidden email]]
Sendt: 22. oktober 2020 18:54
Til: [hidden email]
Emne: Re: Where to put <pb/>

In the interest of sharing, here are ours (at least for this one project):

'Placement of <pb/> tags. The rules are: (1) "pages always break at the top"; that is, <pb/> tags will be inserted in the text at the actual location of the page break (i.e. usually the "top" of the page, but in any case the point at which a reader turns to this page), regardless of the location of the page number on the printed page. (2) "Divisions begin at page breaks; they don't end there"; that is, if a structural break of some kind coincides with the page break (e.g., if a new section (<div>), paragraph, stanza, etc., begins at the head of the new page, the <pb/> tag should be tucked inside the opening tag for the first structural element, neither inside the closing tag for the old division nor between the two divisions. And (3) "Words cannot break at page breaks"; that is, if a hyphenated word straddles a page break, finish the word and any attached punctuation, then insert the <pb/> tag. Treat the hyphen as any other end-of-line hyphen.
Note: (4) in parallel texts, material on a single page is often recorded at widely separated points in the data stream (once in each parallel <div>). In that case, the <pb/> tag, including the page number, should be repeated, i.e., recorded in both <div>s.'

Rule 1 is probably standard, rule 2 controversial, rule 3 perverse, and rule 4 perhaps unexpected.

pfs

On Thu, Oct 22, 2020, at 12:40, Andreas Wagner wrote:
> Dear list,
>
> for much the same practical reasons that have already been mentioned, we
> also "tuck pbs inside divs". If this is of interest, in order to have a
> consistent and unequivocal rule of operation, we have stated:
>
> "In case a page, column, or line beginning occurs in conjunction with a
> respectively encoded "conceptual" text element (e.g., head, p, div,
> note, list, item, titlePage, titlePart, and others), the break element
> (pb, cb, lb) is positioned as child of the first mixed-content element
> occurring in the structure of this conjunction. In the event of more
> than one break element (also including "anchor" elements such as
> milestone) occurring conjunctly, the following order applies: pb, cb,
> lb, milestone, other elements."
>
> (https://salamanca.school/en/guidelines.html#breaks)
>
>
> Best,
> Andreas
>
>
> --
> Dr. Andreas Wagner                          twitter: @anwagnerdreas
> Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
> Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
> and Institute of Philosophy                 fax: +49 (0)69/798-32794
> Goethe University Frankfurt
>
> IGF HP 25 / R 2.455
> Norbert-Wollheim-Platz 1
> 60629 Frankfurt am Main
>

--
Paul Schaffner  Digital Content & Collections
University of Michigan Libraries
[hidden email] | http://www.umich.edu/~pfs/
Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

Alexey Lavrentev-2

Dear all,

FWIW, I follow James' principle in my encoding practice, that is <quote>put the <pb/> (page beginning) at the highest level</quote>.

The first reason is that I find it easier to process. For instance, when creating a table of contents, I can use a simple xpath preceding::tei:pb[1]/@n to provide the page number for each division.

If the <pb/> were included in the <div>, the rule would be more complicated, and I don't think that the function proposed by Sigfrid would work for me: it would return the preceding <pb/> if it exists, and won't check if there is a descendant <pb/> before the first word of the division.

The second reason is more abstract. If I place a <pb/> as first child of a <div>, it means to me that the division starts before the beginning of the page, which is not the case.

As for the Paul's 3rd rule <quote>Words cannot break at page breaks</quote>, we used to apply it to facilitate linguistic processing, but more recently, thanks to @break, we managed to reconcile the word-breaking <pb/>s with tokenization. However, there is still a problem if <fw> is used to encode the page footer or header...

Best regards,

Alexei

Le 23/10/2020 à 09:57, Magdalena Turska a écrit :

In TEI Publisher we distinguish 3 main ways (=views) to navigate through the document

  • single: no subdivisions (entire document or part thereof)
  • div: subdivisions are structural units, div elements in TEI, section elements in DocBook
  • page: subdivisions are reconstructed fragments corresponding to a physical page, so chunks between pb elements in TEI, makes no sense in DocBook


From this perspective, basic logical structure is present in any document and is assumed to be a dominant hierarchy. It is both convenient for processing and easy to formulate an encoding Guideline to say that all milestone elements are nested within the structural markup. It is also not incorrect, I think, to assume that milestone elements (like page break coinciding with a new chapter) are not hanging in some kind of logical void but belong to a logical container. I enlist in the pb-in-a-div camp.


Magdalena


On Fri, 23 Oct 2020 at 09:25, Sigfrid Lundberg <[hidden email]> wrote:
My friends and I build a single service (https://tekster.kb.dk -- it is in Danish, sorry) based on multiple corpora with different encoding practices. I have learned a lot by navigating and searching those.

Most of you seem to make the assumption that a digital edition of text is based on one, and only one source (witness or reading). One edition we are about to (re-)publish is Søren Kierkegaard's collected works. The Enten-Eller (Either-Or) in two volumes is based on seven witnesses and hence we have page breaks from all seven in the text, each with a @wit attribute, an idref to the <witList>. We could (if we had bothered to complicate the UI) present search results for each of them or made concordance between all of them.

The page break is the typographer's (or the scribe's) contribution. It isn't in the original manuscript (which will have page breaks on its own, even if it's born in a digital word processor). However, presenting text isn't a slide show, so never use the page as the indivisible atom of a text, the <div>, <lg>, <sp>, <p> are that. There is a fair chance that the author created them. The page break should be an addressable point in thext.

When indexing a generic TEI-lite kind of doc for collections having facsimiles, this how I figure out which <pb/> a given <div>, <lg>, <sp> or <p> belongs to

  <xsl:function name="me:right_kind_of_page">
    <xsl:param name="here"/>
    <xsl:param name="root"/>
    <xsl:choose>
      <xsl:when test="$here/preceding::t:pb[@facs and not(@rend = 'missing')]">
        <xsl:copy-of select="$here/preceding::t:pb[@facs and not(@rend = 'missing')][1]"/>
      </xsl:when>
      <xsl:when test="$here/descendant::t:pb[@facs and not(@rend = 'missing')]">
        <xsl:copy-of select="$here/descendant::t:pb[@facs and not(@rend = 'missing')][1]"/>
      </xsl:when>
      <xsl:otherwise>
      <xsl:value-of select="false()"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

Note that we have to check for presence of facsimiles and if they are missing.

Suppose this is close to Paul's rule (1). I have a version of this function filtering by @wit since all the editions should be accessible in the same way, I opt for the lemma page breaks.

Yours,

Sigfrid



________________________________________
Fra: TEI (Text Encoding Initiative) public discussion list [[hidden email]] på vegne af Paul Schaffner [[hidden email]]
Sendt: 22. oktober 2020 18:54
Til: [hidden email]
Emne: Re: Where to put <pb/>

In the interest of sharing, here are ours (at least for this one project):

'Placement of <pb/> tags. The rules are: (1) "pages always break at the top"; that is, <pb/> tags will be inserted in the text at the actual location of the page break (i.e. usually the "top" of the page, but in any case the point at which a reader turns to this page), regardless of the location of the page number on the printed page. (2) "Divisions begin at page breaks; they don't end there"; that is, if a structural break of some kind coincides with the page break (e.g., if a new section (<div>), paragraph, stanza, etc., begins at the head of the new page, the <pb/> tag should be tucked inside the opening tag for the first structural element, neither inside the closing tag for the old division nor between the two divisions. And (3) "Words cannot break at page breaks"; that is, if a hyphenated word straddles a page break, finish the word and any attached punctuation, then insert the <pb/> tag. Treat the hyphen as any other end-of-line hyphen.
Note: (4) in parallel texts, material on a single page is often recorded at widely separated points in the data stream (once in each parallel <div>). In that case, the <pb/> tag, including the page number, should be repeated, i.e., recorded in both <div>s.'

Rule 1 is probably standard, rule 2 controversial, rule 3 perverse, and rule 4 perhaps unexpected.

pfs

On Thu, Oct 22, 2020, at 12:40, Andreas Wagner wrote:
> Dear list,
>
> for much the same practical reasons that have already been mentioned, we
> also "tuck pbs inside divs". If this is of interest, in order to have a
> consistent and unequivocal rule of operation, we have stated:
>
> "In case a page, column, or line beginning occurs in conjunction with a
> respectively encoded "conceptual" text element (e.g., head, p, div,
> note, list, item, titlePage, titlePart, and others), the break element
> (pb, cb, lb) is positioned as child of the first mixed-content element
> occurring in the structure of this conjunction. In the event of more
> than one break element (also including "anchor" elements such as
> milestone) occurring conjunctly, the following order applies: pb, cb,
> lb, milestone, other elements."
>
> (https://salamanca.school/en/guidelines.html#breaks)
>
>
> Best,
> Andreas
>
>
> --
> Dr. Andreas Wagner                          twitter: @anwagnerdreas
> Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
> Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
> and Institute of Philosophy                 fax: +49 (0)69/798-32794
> Goethe University Frankfurt
>
> IGF HP 25 / R 2.455
> Norbert-Wollheim-Platz 1
> 60629 Frankfurt am Main
>

--
Paul Schaffner  Digital Content & Collections
University of Michigan Libraries
[hidden email] | http://www.umich.edu/~pfs/
Reply | Threaded
Open this post in threaded view
|

Re: Where to put <pb/>

Bauman, Syd
In reply to this post by Robinson, Peter
I seem to recall having this discussion before.
Yes. See https://listserv.brown.edu/cgi-bin/wa?A2=ind1402&L=TEI-L&P=98175 for what I had hoped would be the definitive answer.

A somewhat shorter and modified version follows.

The correct encoding is between the <div>s. There are reasons (like Paul’s “I have separated the <div> from its surroundings in a non-XML processing system, and do not have anyplace to store a snippet of metadata about said free-floating <div>”) to encode otherwise, but it is generally better (IMHO) to encode truthfully.

Examine your intuition on the matter. Let’s take [1] as an example. I think we can all agree that SECTION 2 of this (quite awful) edition of Flatland starts on page 6, not immediately before the beginning of page 6.

But more importantly, consider the processing. If we encode the <pb> where it falls, between <div>s, then for any simple element in the document, we can answer the question “on what page do I start?” with the same XPath: preceding::pb[1]/@n.[2,3] If we instead position the <pb> at the beginning of the following <div>, we can no longer use that XPath for every (simple) element, as we cannot use it for <div>s that happen to start on a new page. For those we need to use descendant::div[1]/@n (although one can easily imagine that in almost all cases child:: instead of descendant:: would do just fine). Neither XPath is difficult, what is difficult is determining when one should be used over the other. It’s pretty easy when you want to know what page an element other than <div> is on. But when you want to find out what page a <div> is on, you first have to determine whether the <div> started on a new page or not. That can be difficult.

Notes
[2] Presuming the page number is stored on @n, and ignoring complicating factors like multiple representations of the same page beginning or <gap> elements, etc. And I say “simple element” in order to exclude, e.g., those that have @part or @next or @prev, or (like <interp>) are not really on a page at all.
[3] Note that answering the question “on what page do I end?” is a little tougher. I have not tested it, but I think (descendant::pb[last()],following::pb[1])[1]/@n might do the trick.