IIIF and facs

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

IIIF and facs

Ben Brumfield
Dear Colleagues,

Two weeks ago, Patrick Cuba, John Howard, Peter Robinson, Jeffrey Witt and I organized a discussion session on Connecting Text and IIIF at the IIIF Conference at the Vatican.  While we each have different perspectives expressed by our lightning talks, we agree on the need for the TEI community to be involved in conversations about modeling text in IIIF.

My own talk, "Text Beyond Annotations" is online at
http://content.fromthepage.com/text-beyond-annotations-at-iiif-vatican/

I'd be interested in discussing best practices for linking from TEI documents to page facsimiles hosted on IIIF image services.  At the moment I think that the only option we have is to insert a URL to a maximum-resolution image into the facs element of pb.  I'd like to preserve that option for TEI viewers that don't support IIIF, but is there anything better we could do?

Ben

Ben W. Brumfield
Partner, Brumfield Labs
Creators of FromThePage
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs

Janusz S. Bien
Quote/Cytat - Ben Brumfield <[hidden email]> (Sun 18 Jun 2017  
01:24:34 PM CEST):


> I'd be interested in discussing best practices for linking from TEI
> documents to page facsimiles hosted on IIIF image services.

My question is off topic, but is IIIF in any way better than DjVu?
You can easily create an URL to a fragment of a scan represented as a  
DjVu document. Do I understand correctly it is not possible in IIIF?

Best regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)
Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
[hidden email], [hidden email], http://fleksem.klf.uw.edu.pl/~jsbien/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs

Martin Holmes
In reply to this post by Ben Brumfield
Hi Ben,

I'd say there's a great deal more you can do than simply using pb/@facs
to point at the highest-res image; the Representation of Primary Sources
chapter has examples of using <surface> and <zone> to link components of
a transcription to areas on an image, and of linking to multiple images
at different resolutions:

<facsimile>
  <graphic url="page1.png"/>
  <surface>
   <graphic url="page2-highRes.png"/>
   <graphic url="page2-lowRes.png"/>
  </surface>
  <graphic url="page3.png"/>
  <graphic url="page4.png"/>
</facsimile>

<http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHFAX>

Cheers,
Martin

On 2017-06-18 04:24 AM, Ben Brumfield wrote:

> Dear Colleagues,
>
> Two weeks ago, Patrick Cuba, John Howard, Peter Robinson, Jeffrey Witt
> and I organized a discussion session on Connecting Text and IIIF at the
> IIIF Conference at the Vatican.  While we each have different
> perspectives expressed by our lightning talks, we agree on the need for
> the TEI community to be involved in conversations about modeling text in
> IIIF.
>
> My own talk, "Text Beyond Annotations" is online at
> http://content.fromthepage.com/text-beyond-annotations-at-iiif-vatican/
>
> I'd be interested in discussing best practices for linking from TEI
> documents to page facsimiles hosted on IIIF image services.  At the
> moment I think that the only option we have is to insert a URL to a
> maximum-resolution image into the *facs* element of *pb*.  I'd like to
> preserve that option for TEI viewers that don't support IIIF, but is
> there anything better we could do?
>
> Ben
>
> Ben W. Brumfield
> Partner, Brumfield Labs
> Creators of FromThePage <https://fromthepage.com/>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Validation error with <note> as child of <imprint> in <biblStruct>

Charles Muller-5
The Guidelines say that <note> can be a child of <imprint>, but when I
try to do this using tei_all.rng (attempted in both </oXygen> and Emacs)
I get a validation error.

I guess that either the Guidelines or the schema is wrong, but I think
it would be good to allow a <note> inside of <imprint> to allow for
<note>PhD diss.</note>

Regards,

Chuck

--

---------------------------
A. Charles Muller

Graduate School of Humanities and Sociology
Faculty of Letters
University of Tokyo
7-3-1 Hongō, Bunkyō-ku
Tokyo 113-8654, Japan

Office Phone: 03-5841-3735

Web Site: Resources for East Asian Language and Thought
http://www.acmuller.net

Twitter: @H_Buddhism
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Validation error with <note> as child of <imprint> in <biblStruct>

Charles Muller-5
Here's an example for the below problem:

http://www.acmuller.net/biblStruct-test.xml

Regards,

Chuck

> The Guidelines say that <note> can be a child of <imprint>, but when I
> try to do this using tei_all.rng (attempted in both </oXygen> and Emacs)
> I get a validation error.
>
> I guess that either the Guidelines or the schema is wrong, but I think
> it would be good to allow a <note> inside of <imprint> to allow for
> <note>PhD diss.</note>
>
> Regards,
>
> Chuck
>


--

---------------------------
A. Charles Muller

Graduate School of Humanities and Sociology
Faculty of Letters
University of Tokyo
7-3-1 Hongō, Bunkyō-ku
Tokyo 113-8654, Japan

Office Phone: 03-5841-3735

Web Site: Resources for East Asian Language and Thought
http://www.acmuller.net

Twitter: @H_Buddhism
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Validation error with <note> as child of <imprint> in <biblStruct>

Peter Stadler
Hi Chuck,

note is indeed allowed within imprint, but only after some other information (from model.imprintPart or model.dateLike). So, in your example, if you move the note after the date there should be no more complaints.

Best
Peter


> Am 19.06.2017 um 06:26 schrieb Charles Muller <[hidden email]>:
>
> Here's an example for the below problem:
>
> http://www.acmuller.net/biblStruct-test.xml
>
> Regards,
>
> Chuck
>
>> The Guidelines say that <note> can be a child of <imprint>, but when I try to do this using tei_all.rng (attempted in both </oXygen> and Emacs) I get a validation error.
>> I guess that either the Guidelines or the schema is wrong, but I think it would be good to allow a <note> inside of <imprint> to allow for <note>PhD diss.</note>
>> Regards,
>> Chuck
>
>
> --
>
> ---------------------------
> A. Charles Muller
>
> Graduate School of Humanities and Sociology
> Faculty of Letters
> University of Tokyo
> 7-3-1 Hongō, Bunkyō-ku
> Tokyo 113-8654, Japan
>
> Office Phone: 03-5841-3735
>
> Web Site: Resources for East Asian Language and Thought
> http://www.acmuller.net
>
> Twitter: @H_Buddhism
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Validation error with <note> as child of <imprint> in <biblStruct>

Charles Muller-5
Dear Peter,

> note is indeed allowed within imprint, but only after some other
> information (from model.imprintPart or model.dateLike). So, in your
> example, if you move the note after the date there should be no more
> complaints.

Ah, I see!

Thank you,

Chuck

>
>> Am 19.06.2017 um 06:26 schrieb Charles Muller
>> <[hidden email]>:
>>
>> Here's an example for the below problem:
>>
>> http://www.acmuller.net/biblStruct-test.xml
>>
>> Regards,
>>
>> Chuck
>>
>>> The Guidelines say that <note> can be a child of <imprint>, but
>>> when I try to do this using tei_all.rng (attempted in both
>>> </oXygen> and Emacs) I get a validation error. I guess that
>>> either the Guidelines or the schema is wrong, but I think it
>>> would be good to allow a <note> inside of <imprint> to allow for
>>> <note>PhD diss.</note> Regards, Chuck
>>
>>
>> --
>>
>> --------------------------- A. Charles Muller
>>
>> Graduate School of Humanities and Sociology Faculty of Letters
>> University of Tokyo 7-3-1 Hongō, Bunkyō-ku Tokyo 113-8654, Japan
>>
>> Office Phone: 03-5841-3735
>>
>> Web Site: Resources for East Asian Language and Thought
>> http://www.acmuller.net
>>
>> Twitter: @H_Buddhism
>
>
>


--

---------------------------
A. Charles Muller

Graduate School of Humanities and Sociology
Faculty of Letters
University of Tokyo
7-3-1 Hongō, Bunkyō-ku
Tokyo 113-8654, Japan

Office Phone: 03-5841-3735

Web Site: Resources for East Asian Language and Thought
http://www.acmuller.net

Twitter: @H_Buddhism
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs (and TEI)

Christian-Emil Ore-4
In reply to this post by Martin Holmes
Hi
We looking for a decent viewer for  the facsimiles of Henrik Ibsens manuscripts where all texts ar transcribed as TEI xml-documetns by the large project Henrik Ibsen's writings mostly in the 1990s. It is clear that something like the universal viewer (http://universalviewer.io/)  may do the job. This is a IIIF thing. I studied the specification of IIIF before I read Ben’s report from the IIIF. It is an easy match to view the facsimiles, but it is harder to add advanced (meta)data outside the simple open-annotation-universe. I read Ben’s restored Vatican talk and also the notes indicating Peter Robinson’s view. A text is not a series of pages. In any case I assume that it is easy to link from some viewer of tei-xml-encoded text to an instance of the universal viewer, but may be not so easy the other way round. The question is whether the data model in IIIF is well suited for modelling texts in the way TEI recommend.
I will be interested in participating in a discussion about this.

Best
Christian-Emil

_____________________
From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Martin Holmes <[hidden email]>
Sent: 18 June 2017 19:27
To: [hidden email]
Subject: Re: IIIF and facs

Hi Ben,

I'd say there's a great deal more you can do than simply using pb/@facs
to point at the highest-res image; the Representation of Primary Sources
chapter has examples of using <surface> and <zone> to link components of
a transcription to areas on an image, and of linking to multiple images
at different resolutions:

<facsimile>
  <graphic url="page1.png"/>
  <surface>
   <graphic url="page2-highRes.png"/>
   <graphic url="page2-lowRes.png"/>
  </surface>
  <graphic url="page3.png"/>
  <graphic url="page4.png"/>
</facsimile>

<http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHFAX>

Cheers,
Martin

On 2017-06-18 04:24 AM, Ben Brumfield wrote:

> Dear Colleagues,
>
> Two weeks ago, Patrick Cuba, John Howard, Peter Robinson, Jeffrey Witt
> and I organized a discussion session on Connecting Text and IIIF at the
> IIIF Conference at the Vatican.  While we each have different
> perspectives expressed by our lightning talks, we agree on the need for
> the TEI community to be involved in conversations about modeling text in
> IIIF.
>
> My own talk, "Text Beyond Annotations" is online at
> http://content.fromthepage.com/text-beyond-annotations-at-iiif-vatican/
>
> I'd be interested in discussing best practices for linking from TEI
> documents to page facsimiles hosted on IIIF image services.  At the
> moment I think that the only option we have is to insert a URL to a
> maximum-resolution image into the *facs* element of *pb*.  I'd like to
> preserve that option for TEI viewers that don't support IIIF, but is
> there anything better we could do?
>
> Ben
>
> Ben W. Brumfield
> Partner, Brumfield Labs
> Creators of FromThePage <https://fromthepage.com/>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs (and TEI)

C. M. Sperberg-McQueen
Christian-Emil and others,

On the topic of links from page images to texts, how much utility
would there be in providing one link for each page image, which
goes to the location in the text where the material at the beginning
of that page appears?  (Concretely, I guess, you’d implement this as
a link from the page image to the pb element that points to that
page — if for complicated reasons there is more than one pb
signaling the start of that page, you either find a way to support
multi-ended links (good luck, but Eric Van der Vlist had a nice Balisage
paper on the subject, it’s not impossible), or you choose one of
the pb elements to link to.

If one wants to go from any written word on the page to the
transcription of that word, then of course a single link per page
is not going to make one happy.  (But I don’t know — won’t an
IIIF client interpret any click in an image as a request to zoom in
on that point?  If that is so, then you won’t be able to have word-
or textblock-level linking from image to text anyway.  But I don’t
have experience deploying IIIF, only a slowly ulcerating item on my
to-do list telling me I need to see whether IIIF is feasible for a
given project.  I am pathetically grateful to those who have raised
the question here.)

It may be helpful if the image-to-text link goes to a display of
the text which preserves the lineation of the original (if such as
display is possible), so that it’s easier to find one’s way around.
From there, other links to go to other display formats for the text
can use whatever user interface functions the project already
provides for that kind of thing.

I hope you (and others) will keep the list posted on any progress
or any experiments you undertake.

best,

Michael

 

> On Jun 23, 2017, at 1:38 PM, Christian-Emil Smith Ore <[hidden email]> wrote:
>
> Hi
> We looking for a decent viewer for  the facsimiles of Henrik Ibsens manuscripts where all texts ar transcribed as TEI xml-documetns by the large project Henrik Ibsen's writings mostly in the 1990s. It is clear that something like the universal viewer (http://universalviewer.io/)  may do the job. This is a IIIF thing. I studied the specification of IIIF before I read Ben’s report from the IIIF. It is an easy match to view the facsimiles, but it is harder to add advanced (meta)data outside the simple open-annotation-universe. I read Ben’s restored Vatican talk and also the notes indicating Peter Robinson’s view. A text is not a series of pages. In any case I assume that it is easy to link from some viewer of tei-xml-encoded text to an instance of the universal viewer, but may be not so easy the other way round. The question is whether the data model in IIIF is well suited for modelling texts in the way TEI recommend.
> I will be interested in participating in a discussion about this.
>
> Best
> Christian-Emil
>
> _____________________
> From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Martin Holmes <[hidden email]>
> Sent: 18 June 2017 19:27
> To: [hidden email]
> Subject: Re: IIIF and facs
>
> Hi Ben,
>
> I'd say there's a great deal more you can do than simply using pb/@facs
> to point at the highest-res image; the Representation of Primary Sources
> chapter has examples of using <surface> and <zone> to link components of
> a transcription to areas on an image, and of linking to multiple images
> at different resolutions:
>
> <facsimile>
>  <graphic url="page1.png"/>
>  <surface>
>   <graphic url="page2-highRes.png"/>
>   <graphic url="page2-lowRes.png"/>
>  </surface>
>  <graphic url="page3.png"/>
>  <graphic url="page4.png"/>
> </facsimile>
>
> <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHFAX>
>
> Cheers,
> Martin
>
> On 2017-06-18 04:24 AM, Ben Brumfield wrote:
>> Dear Colleagues,
>>
>> Two weeks ago, Patrick Cuba, John Howard, Peter Robinson, Jeffrey Witt
>> and I organized a discussion session on Connecting Text and IIIF at the
>> IIIF Conference at the Vatican.  While we each have different
>> perspectives expressed by our lightning talks, we agree on the need for
>> the TEI community to be involved in conversations about modeling text in
>> IIIF.
>>
>> My own talk, "Text Beyond Annotations" is online at
>> http://content.fromthepage.com/text-beyond-annotations-at-iiif-vatican/
>>
>> I'd be interested in discussing best practices for linking from TEI
>> documents to page facsimiles hosted on IIIF image services.  At the
>> moment I think that the only option we have is to insert a URL to a
>> maximum-resolution image into the *facs* element of *pb*.  I'd like to
>> preserve that option for TEI viewers that don't support IIIF, but is
>> there anything better we could do?
>>
>> Ben
>>
>> Ben W. Brumfield
>> Partner, Brumfield Labs
>> Creators of FromThePage <https://fromthepage.com/>

********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs (and TEI)

Christian-Emil Ore-4
I sent the discussion to Robert Sanderson and asked for comments. In case he is not following the TEI-list, I will report back.

Best,
Christian-Emil
________________________________________
From: C. M. Sperberg-McQueen <[hidden email]>
Sent: 24 June 2017 04:50
To: Christian-Emil Smith Ore
Cc: C. M. Sperberg-McQueen; [hidden email]
Subject: Re: IIIF and facs (and TEI)

Christian-Emil and others,

On the topic of links from page images to texts, how much utility
would there be in providing one link for each page image, which
goes to the location in the text where the material at the beginning
of that page appears?  (Concretely, I guess, you’d implement this as
a link from the page image to the pb element that points to that
page — if for complicated reasons there is more than one pb
signaling the start of that page, you either find a way to support
multi-ended links (good luck, but Eric Van der Vlist had a nice Balisage
paper on the subject, it’s not impossible), or you choose one of
the pb elements to link to.

If one wants to go from any written word on the page to the
transcription of that word, then of course a single link per page
is not going to make one happy.  (But I don’t know — won’t an
IIIF client interpret any click in an image as a request to zoom in
on that point?  If that is so, then you won’t be able to have word-
or textblock-level linking from image to text anyway.  But I don’t
have experience deploying IIIF, only a slowly ulcerating item on my
to-do list telling me I need to see whether IIIF is feasible for a
given project.  I am pathetically grateful to those who have raised
the question here.)

It may be helpful if the image-to-text link goes to a display of
the text which preserves the lineation of the original (if such as
display is possible), so that it’s easier to find one’s way around.
From there, other links to go to other display formats for the text
can use whatever user interface functions the project already
provides for that kind of thing.

I hope you (and others) will keep the list posted on any progress
or any experiments you undertake.

best,

Michael


> On Jun 23, 2017, at 1:38 PM, Christian-Emil Smith Ore <[hidden email]> wrote:
>
> Hi
> We looking for a decent viewer for  the facsimiles of Henrik Ibsens manuscripts where all texts ar transcribed as TEI xml-documetns by the large project Henrik Ibsen's writings mostly in the 1990s. It is clear that something like the universal viewer (http://universalviewer.io/)  may do the job. This is a IIIF thing. I studied the specification of IIIF before I read Ben’s report from the IIIF. It is an easy match to view the facsimiles, but it is harder to add advanced (meta)data outside the simple open-annotation-universe. I read Ben’s restored Vatican talk and also the notes indicating Peter Robinson’s view. A text is not a series of pages. In any case I assume that it is easy to link from some viewer of tei-xml-encoded text to an instance of the universal viewer, but may be not so easy the other way round. The question is whether the data model in IIIF is well suited for modelling texts in the way TEI recommend.
> I will be interested in participating in a discussion about this.
>
> Best
> Christian-Emil
>
> _____________________
> From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Martin Holmes <[hidden email]>
> Sent: 18 June 2017 19:27
> To: [hidden email]
> Subject: Re: IIIF and facs
>
> Hi Ben,
>
> I'd say there's a great deal more you can do than simply using pb/@facs
> to point at the highest-res image; the Representation of Primary Sources
> chapter has examples of using <surface> and <zone> to link components of
> a transcription to areas on an image, and of linking to multiple images
> at different resolutions:
>
> <facsimile>
>  <graphic url="page1.png"/>
>  <surface>
>   <graphic url="page2-highRes.png"/>
>   <graphic url="page2-lowRes.png"/>
>  </surface>
>  <graphic url="page3.png"/>
>  <graphic url="page4.png"/>
> </facsimile>
>
> <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHFAX>
>
> Cheers,
> Martin
>
> On 2017-06-18 04:24 AM, Ben Brumfield wrote:
>> Dear Colleagues,
>>
>> Two weeks ago, Patrick Cuba, John Howard, Peter Robinson, Jeffrey Witt
>> and I organized a discussion session on Connecting Text and IIIF at the
>> IIIF Conference at the Vatican.  While we each have different
>> perspectives expressed by our lightning talks, we agree on the need for
>> the TEI community to be involved in conversations about modeling text in
>> IIIF.
>>
>> My own talk, "Text Beyond Annotations" is online at
>> http://content.fromthepage.com/text-beyond-annotations-at-iiif-vatican/
>>
>> I'd be interested in discussing best practices for linking from TEI
>> documents to page facsimiles hosted on IIIF image services.  At the
>> moment I think that the only option we have is to insert a URL to a
>> maximum-resolution image into the *facs* element of *pb*.  I'd like to
>> preserve that option for TEI viewers that don't support IIIF, but is
>> there anything better we could do?
>>
>> Ben
>>
>> Ben W. Brumfield
>> Partner, Brumfield Labs
>> Creators of FromThePage <https://fromthepage.com/>

********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
[hidden email]
http://www.blackmesatech.com
********************************************

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs (and TEI)

Georg Vogeler-2
Dear everybody,

just to add another person of possible reference to this: The OCR/HTR
community (in particular in the transkribus project) start their text
perception by the surface of page and have developed an elaborated
encoding scheme for this (called, how surprising, PAGE
http://www.primaresearch.org/schema/PAGE/). Dominque Stutzmann is
convinced, that everything which could be expressed with PAGE could also
be expressed with TEI. So he might wish to contribute to this discussion?

I am not too familar with IIIF myself, but as far as my perception goes,
the idea to adress a single zone on an image is part of the API - and
thus one should be able to convert everything you could express with
tei:zone-constructs (embedded transcriptions, references between image
an text) with IIIF as well.

Curious what the other experiences will be!

Georg

Am 24.06.2017 um 05:24 schrieb Christian-Emil Smith Ore:

> I sent the discussion to Robert Sanderson and asked for comments. In case he is not following the TEI-list, I will report back.
>
> Best,
> Christian-Emil
> ________________________________________
> From: C. M. Sperberg-McQueen <[hidden email]>
> Sent: 24 June 2017 04:50
> To: Christian-Emil Smith Ore
> Cc: C. M. Sperberg-McQueen; [hidden email]
> Subject: Re: IIIF and facs (and TEI)
>
> Christian-Emil and others,
>
> On the topic of links from page images to texts, how much utility
> would there be in providing one link for each page image, which
> goes to the location in the text where the material at the beginning
> of that page appears?  (Concretely, I guess, you’d implement this as
> a link from the page image to the pb element that points to that
> page — if for complicated reasons there is more than one pb
> signaling the start of that page, you either find a way to support
> multi-ended links (good luck, but Eric Van der Vlist had a nice Balisage
> paper on the subject, it’s not impossible), or you choose one of
> the pb elements to link to.
>
> If one wants to go from any written word on the page to the
> transcription of that word, then of course a single link per page
> is not going to make one happy.  (But I don’t know — won’t an
> IIIF client interpret any click in an image as a request to zoom in
> on that point?  If that is so, then you won’t be able to have word-
> or textblock-level linking from image to text anyway.  But I don’t
> have experience deploying IIIF, only a slowly ulcerating item on my
> to-do list telling me I need to see whether IIIF is feasible for a
> given project.  I am pathetically grateful to those who have raised
> the question here.)
>
> It may be helpful if the image-to-text link goes to a display of
> the text which preserves the lineation of the original (if such as
> display is possible), so that it’s easier to find one’s way around.
> From there, other links to go to other display formats for the text
> can use whatever user interface functions the project already
> provides for that kind of thing.
>
> I hope you (and others) will keep the list posted on any progress
> or any experiments you undertake.
>
> best,
>
> Michael
>
>
>> On Jun 23, 2017, at 1:38 PM, Christian-Emil Smith Ore <[hidden email]> wrote:
>>
>> Hi
>> We looking for a decent viewer for  the facsimiles of Henrik Ibsens manuscripts where all texts ar transcribed as TEI xml-documetns by the large project Henrik Ibsen's writings mostly in the 1990s. It is clear that something like the universal viewer (http://universalviewer.io/)  may do the job. This is a IIIF thing. I studied the specification of IIIF before I read Ben’s report from the IIIF. It is an easy match to view the facsimiles, but it is harder to add advanced (meta)data outside the simple open-annotation-universe. I read Ben’s restored Vatican talk and also the notes indicating Peter Robinson’s view. A text is not a series of pages. In any case I assume that it is easy to link from some viewer of tei-xml-encoded text to an instance of the universal viewer, but may be not so easy the other way round. The question is whether the data model in IIIF is well suited for modelling texts in the way TEI recommend.
>> I will be interested in participating in a discussion about this.
>>
>> Best
>> Christian-Emil
>>
>> _____________________
>> From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Martin Holmes <[hidden email]>
>> Sent: 18 June 2017 19:27
>> To: [hidden email]
>> Subject: Re: IIIF and facs
>>
>> Hi Ben,
>>
>> I'd say there's a great deal more you can do than simply using pb/@facs
>> to point at the highest-res image; the Representation of Primary Sources
>> chapter has examples of using <surface> and <zone> to link components of
>> a transcription to areas on an image, and of linking to multiple images
>> at different resolutions:
>>
>> <facsimile>
>>  <graphic url="page1.png"/>
>>  <surface>
>>   <graphic url="page2-highRes.png"/>
>>   <graphic url="page2-lowRes.png"/>
>>  </surface>
>>  <graphic url="page3.png"/>
>>  <graphic url="page4.png"/>
>> </facsimile>
>>
>> <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHFAX>
>>
>> Cheers,
>> Martin
>>
>> On 2017-06-18 04:24 AM, Ben Brumfield wrote:
>>> Dear Colleagues,
>>>
>>> Two weeks ago, Patrick Cuba, John Howard, Peter Robinson, Jeffrey Witt
>>> and I organized a discussion session on Connecting Text and IIIF at the
>>> IIIF Conference at the Vatican.  While we each have different
>>> perspectives expressed by our lightning talks, we agree on the need for
>>> the TEI community to be involved in conversations about modeling text in
>>> IIIF.
>>>
>>> My own talk, "Text Beyond Annotations" is online at
>>> http://content.fromthepage.com/text-beyond-annotations-at-iiif-vatican/
>>>
>>> I'd be interested in discussing best practices for linking from TEI
>>> documents to page facsimiles hosted on IIIF image services.  At the
>>> moment I think that the only option we have is to insert a URL to a
>>> maximum-resolution image into the *facs* element of *pb*.  I'd like to
>>> preserve that option for TEI viewers that don't support IIIF, but is
>>> there anything better we could do?
>>>
>>> Ben
>>>
>>> Ben W. Brumfield
>>> Partner, Brumfield Labs
>>> Creators of FromThePage <https://fromthepage.com/>
>
> ********************************************
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> [hidden email]
> http://www.blackmesatech.com
> ********************************************
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs (and TEI)

Robinson, Peter
In reply to this post by Christian-Emil Ore-4
Time for a little context, I think.

The IIIF community is large, growing, and multifaceted (sound familiar, anyone?). For some time now, several of us (beginning with Domhnall Ó h’Éigheartaigh, Patrick Cuba, myself and various others) have been looking at how IIIF and complex texts might play together. This group now includes (among others) John Bryant, Ben Brumfield (whose post on this list sparked this discussion), Jeffrey Witt, John Howard, Rafaelle Vigilante and Nick Laicona. Many of us were at the recent IIIF conference in Rome, where we presented a series of ruminations on the potential (great!), technical issues (multiple) and possible strategies (far too many) on how we might link complicated texts, typically referencing information extending far beyond the page-based model of IIIF, with IIIF.

No firm answers yet. Anyone who wants to join our group as we wrestle with all this, please email any one of us (in the distribution list on this email). I can imagine that some time in the future (the November TEI members meeting?) the TEI itself might want to look at linkages betwixt TEI and IIIF.

Peter

On Jun 23, 2017, at 9:38 PM, Christian-Emil Smith Ore <[hidden email]> wrote:

>
>
> Hi
> We looking for a decent viewer for  the facsimiles of Henrik Ibsens manuscripts where all texts ar transcribed as TEI xml-documetns by the large project Henrik Ibsen's writings mostly in the 1990s. It is clear that something like the universal viewer (http://universalviewer.io/)  may do the job. This is a IIIF thing. I studied the specification of IIIF before I read Ben’s report from the IIIF. It is an easy match to view the facsimiles, but it is harder to add advanced (meta)data outside the simple open-annotation-universe. I read Ben’s restored Vatican talk and also the notes indicating Peter Robinson’s view. A text is not a series of pages. In any case I assume that it is easy to link from some viewer of tei-xml-encoded text to an instance of the universal viewer, but may be not so easy the other way round. The question is whether the data model in IIIF is well suited for modelling texts in the way TEI recommend.
> I will be interested in participating in a discussion about this.
>
> Best
> Christian-Emil
>
> _____________________
> From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Martin Holmes <[hidden email]>
> Sent: 18 June 2017 19:27
> To: [hidden email]
> Subject: Re: IIIF and facs
>
> Hi Ben,
>
> I'd say there's a great deal more you can do than simply using pb/@facs
> to point at the highest-res image; the Representation of Primary Sources
> chapter has examples of using <surface> and <zone> to link components of
> a transcription to areas on an image, and of linking to multiple images
> at different resolutions:
>
> <facsimile>
>  <graphic url="page1.png"/>
>  <surface>
>   <graphic url="page2-highRes.png"/>
>   <graphic url="page2-lowRes.png"/>
>  </surface>
>  <graphic url="page3.png"/>
>  <graphic url="page4.png"/>
> </facsimile>
>
> <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHFAX>
>
> Cheers,
> Martin
>
> On 2017-06-18 04:24 AM, Ben Brumfield wrote:
>> Dear Colleagues,
>>
>> Two weeks ago, Patrick Cuba, John Howard, Peter Robinson, Jeffrey Witt
>> and I organized a discussion session on Connecting Text and IIIF at the
>> IIIF Conference at the Vatican.  While we each have different
>> perspectives expressed by our lightning talks, we agree on the need for
>> the TEI community to be involved in conversations about modeling text in
>> IIIF.
>>
>> My own talk, "Text Beyond Annotations" is online at
>> http://content.fromthepage.com/text-beyond-annotations-at-iiif-vatican/
>>
>> I'd be interested in discussing best practices for linking from TEI
>> documents to page facsimiles hosted on IIIF image services.  At the
>> moment I think that the only option we have is to insert a URL to a
>> maximum-resolution image into the *facs* element of *pb*.  I'd like to
>> preserve that option for TEI viewers that don't support IIIF, but is
>> there anything better we could do?
>>
>> Ben
>>
>> Ben W. Brumfield
>> Partner, Brumfield Labs
>> Creators of FromThePage <https://fromthepage.com/>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs (and TEI)

Stutzmann Dominique-2
Dear Peter, Georg, and all,

since Georg invited me to contribute, here I share on the different, connected issues , esp. format (TEI, PAGE, IIIF), granularity, text and image, text as annotation and its visualisation, and software engineering.


1) Formats for linking text and image

a) TEI and annotation coordinates

In several projects, colleagues from linguistics and paleaography, including Alexei Lavrentiev and me, have felt the need to link closely images and (analyzed) text at the levels of words and characters.

This led us to specify a stand-off TEI format to deal with <facsimile> and <text>. The format developped in the Oriflamms project is here (described in French):
(part 1) http://oriflamms.hypotheses.org/1442
(part 2)  http://oriflamms.hypotheses.org/1510

The main principles are that
- the texts are encoded in TEI (or teiCorpus > TEI) > text and tokenized with <c> and <w> with @xml:id
- the corresponding <facsimile> and <zone> declarations are in separate files, with @xml:id
- there is one file per image linking the @xml:id from the textual content (at character, word, line, column, page level) to the graphic content.

These <facsimile> declarations are stored in one file per image (in a distinct folder) and we create <zone> elements for page (as stated in the discussion, you may have several pages reproduced on one image), columns, line, word and character or punctuation. A word can cross the page/column/line break. A character can cross the word separation (it is quite rare, but it happens, e.g. st ligature across two words).

Several corpora in this format are on the project's GitHub instance: https://github.com/oriflamms (start with a https://github.com/oriflamms/Test_Fontenay/).


b) PAGE and TEI

The PAGE format is dedicated to describing what is on a page. It does have some structured information that TEI cannot render in the same structured way. This facts naturally derives from an image-oriented format vs. a text-oriented one. The description of the layout has a very fine level of granularity in PAGE:  for example there are attributes @colourDepth or @bgColour to give the information about "The colour bit depth required for the region" or "The background colour of the region".
One can transfer this type of information in TEI, but often either in an non structured or in an non explicit way.

For example PAGE @indented in RegionType may correspond to:
- @rend="indent" at different possible levels : level of layout (<pb/>, <cb/> but applying @rend='indent' at these elements would be an abuse), level of textual analysis (<p> or <l>), or in a more neutral way: PAGE is neutral and does not provide any analysis, so converting a block should create <ab> rather than <p>.
- if there is a fully aligned text: shorter lines at the beginning of paragraphs in TEI text + facsimile

The main difference, from an intellectual perspective, is that PAGE is used to store data from HTR or OCR. So, any part of "understanding" has to be encoded additionally. For example, the order of reading, which is implicit in TEI (<ReadingOrder>).

As a matter of fact, in all instances of PAGE files that I have seen, there was no information which we could not transfer straightforwardly from one format to the other without using unstructured <desc> elements.This could require additions to TEI (while remembering what "T" stands for in "TEI").

c) IIIF and TEI
As evidenced in Ben Brumfield's excellent contribution, one might find hard to see the upside of a very verbose format to store very small bits of information without being able to encode them in the full meaning of the word, that is to analyze it. One strength is that IIIF (as PAGE) allows to make the order of text elements (via annotations) explicit.

2) Granularity and big data: consequences for alignment and visualization

As mentioned above, my colleagues and I are working on the text as image at word and character level. This level of granularity has a consequence on format and on the software we use. Indeed, annotating several hundreds of thousands characters manually is very time consuming and also challenging, and the coordinates of the zones have to be modified. The Oriflamms format described above does not make use of @facs but only uses @xml:id and links. It helps keeping the textual analysis in one file and the graphic analysis in other files and systems.

The consequence for the software engineering is that we have
- a routine on the software TXM to prepare a corpus from an edited text. At this stage, this routine requires a @facs on <pb/> to indicate the beginning of a new image, but you then can have several pages on one image.
- a software to align TEI encoded texts with images and visualize, correct and validate the results in a linear and in a tabular form.
This software is open source : https://github.com/Liris-Pleiad/oriflamms There is an .exe version .

In the more recent European project HIMANIS, we have used TEI editions to nourish HTR (Handwritten Text Recognition) systems and provide our textual community with an indexed corpus of 147 medieval manuscripts in Latin and French. The result is a giant index in which you can search and set parameters about word confidence. Each word region on an image may be indexed with several recognition hypotheses (typically ten), each having a confidence level.
If your haven't seen it yet, please have a look at (search engine) http://prhlt-kws.prhlt.upv.es/himanis/ and  (instructions) http://himanis.hypotheses.org/105 (and please, don't forget to validate or reject the hits that are found). The project is not finished and we will add a lot of things, but, by now, you can search for words and spot them with coordinates on the image.
From a data modeling point of view, the results of Key Word Spotting is typically an "annotation" in the sense of IIIF. There is no stringent reading order (in the worflow, there is a line recognition step and key words are typically spotted on the lines, but having a false line segmentation does not prevent key word spotting from being accurate). For "sequence search", we assume a top to bottom, left to right reading order, and "graphic proximity" on the page, but this is not a "phrase" search. User feedback on KWS results is annotation on annotations. The beta-Interface does not provide a visualisation of all annotations, but IIIF, despite being very verbose, would be a "natural" format to exchange those annotations on images.

3) Data conversion and software

A software like Transkibus https://transkribus.eu/Transkribus/, developped by READ, to which the University of Valencia (Spain) is partner, can deal with PAGE format and TEI in a very effective way and export correctly from one format to another.

Implementing IIIF to visualize all annotations is an obvious target. But, going back to the discussion, we also wish to provide one linear transcript (the string build from the sequence of the hypothese with best confidence), reintegrated into TEI to allow for correction, validation and semantic encoding, representing the text as a text and not only in a conundrum of 40 automated annotations with confidence level plus one or several annotations from (human) scholars or users, with or without reading order. From a logical perspective, it is not the same to identify let's say a quote in the text and to mark a sequence of canvas as being a quote. To me, a sequence of canvas is not a meaning, it is a graphic content that can be read.

With some of the same partners plus the Library of Poitiers and Teklia, and with some funding from Biblissima, we want to make our developments in HTR IIIF-compliant. That is to produce text transcriptions (from HTR) in TEI format, as said above with <text>, <facsimile> and /links/ (even if the result is less efficient than in PAGE or IIIF for that matter, because it opens the way to linguistic and paleographic analysis), then publishing it as manifest and annotations for IIIF (each transcribed word or character is the content of an "annotation" on a particular "canvas" on the image), using IIIF API to present the results and collect feedback and corrections, keeping the needed ids and then re-nesting the results into TEI files (tokenized at word/character level).

So, wrapping up, both in TEI and IIIF there are ways to link at a finer level of granularity than the page. If I understand correctly, one probably could do everything in IIIF that has been done in TEI, by annotating a sequence of annotation at word or character level and marking this sequence as being "tei:p" or "tei:l" etc. , but I am not sure that it would be a great benefit for one community or the other. In our projects, we had both directions: starting from TEI editions to create data on images and starting from image analysis to create textual content, and, as a paleographer, working on both text and image, I really am convinced by the need to have text analysis as well as annotations. The proposed strategy would be to use each format for what it is the most useful and to implement automated mechanisms to let our formats communicate in a seamless way, working at the finer level of granularity probably makes it easier.

Best regards,
Dominique


Le Samedi 24 juin 2017 20h00, "Robinson, Peter" <[hidden email]> a écrit :


Time for a little context, I think.

The IIIF community is large, growing, and multifaceted (sound familiar, anyone?). For some time now, several of us (beginning with Domhnall Ó h’Éigheartaigh, Patrick Cuba, myself and various others) have been looking at how IIIF and complex texts might play together. This group now includes (among others) John Bryant, Ben Brumfield (whose post on this list sparked this discussion), Jeffrey Witt, John Howard, Rafaelle Vigilante and Nick Laicona. Many of us were at the recent IIIF conference in Rome, where we presented a series of ruminations on the potential (great!), technical issues (multiple) and possible strategies (far too many) on how we might link complicated texts, typically referencing information extending far beyond the page-based model of IIIF, with IIIF.

No firm answers yet. Anyone who wants to join our group as we wrestle with all this, please email any one of us (in the distribution list on this email). I can imagine that some time in the future (the November TEI members meeting?) the TEI itself might want to look at linkages betwixt TEI and IIIF.

Peter

On Jun 23, 2017, at 9:38 PM, Christian-Emil Smith Ore <[hidden email]> wrote:

>
>
> Hi
> We looking for a decent viewer for  the facsimiles of Henrik Ibsens manuscripts where all texts ar transcribed as TEI xml-documetns by the large project Henrik Ibsen's writings mostly in the 1990s. It is clear that something like the universal viewer (http://universalviewer.io/)  may do the job. This is a IIIF thing. I studied the specification of IIIF before I read Ben’s report from the IIIF. It is an easy match to view the facsimiles, but it is harder to add advanced (meta)data outside the simple open-annotation-universe. I read Ben’s restored Vatican talk and also the notes indicating Peter Robinson’s view. A text is not a series of pages. In any case I assume that it is easy to link from some viewer of tei-xml-encoded text to an instance of the universal viewer, but may be not so easy the other way round. The question is whether the data model in IIIF is well suited for modelling texts in the way TEI recommend.
> I will be interested in participating in a discussion about this.
>
> Best
> Christian-Emil
>
> _____________________
> From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Martin Holmes <[hidden email]>
> Sent: 18 June 2017 19:27
> To: [hidden email]
> Subject: Re: IIIF and facs
>
> Hi Ben,
>
> I'd say there's a great deal more you can do than simply using pb/@facs
> to point at the highest-res image; the Representation of Primary Sources
> chapter has examples of using <surface> and <zone> to link components of
> a transcription to areas on an image, and of linking to multiple images
> at different resolutions:
>
> <facsimile>
>  <graphic url="page1.png"/>
>  <surface>
>  <graphic url="page2-highRes.png"/>
>  <graphic url="page2-lowRes.png"/>
>  </surface>
>  <graphic url="page3.png"/>
>  <graphic url="page4.png"/>
> </facsimile>
>
> <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHFAX>
>
> Cheers,
> Martin
>
> On 2017-06-18 04:24 AM, Ben Brumfield wrote:
>> Dear Colleagues,
>>
>> Two weeks ago, Patrick Cuba, John Howard, Peter Robinson, Jeffrey Witt
>> and I organized a discussion session on Connecting Text and IIIF at the
>> IIIF Conference at the Vatican.  While we each have different
>> perspectives expressed by our lightning talks, we agree on the need for
>> the TEI community to be involved in conversations about modeling text in
>> IIIF.
>>
>> My own talk, "Text Beyond Annotations" is online at
>> http://content.fromthepage.com/text-beyond-annotations-at-iiif-vatican/
>>
>> I'd be interested in discussing best practices for linking from TEI
>> documents to page facsimiles hosted on IIIF image services.  At the
>> moment I think that the only option we have is to insert a URL to a
>> maximum-resolution image into the *facs* element of *pb*.  I'd like to
>> preserve that option for TEI viewers that don't support IIIF, but is
>> there anything better we could do?
>>
>> Ben
>>
>> Ben W. Brumfield
>> Partner, Brumfield Labs
>> Creators of FromThePage <https://fromthepage.com/>



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs (and TEI)

KANZAKI Masahide
Hello all,

I have a small experiment that connects TEI/XML data and images via
IIIF, which might be of your interest.

Linked First Folio [1] allows users to search words in Shakespeare's
plays and to reach a page that contains the results as well as
facsimile image of the page. It utilizes TEI and IIIF from the
Bodleian First Folio [2] to associate a word/phrase in XML and the
page image.

It does not directly use facs attr values in Bodleian TEI. Instead, it
uses pre-generated mapping between a page range in TEI and an image
resource in IIIF manifest as an Web Annotation, e.g.

{
    "id": "p152",
    "type": "Annotation",
    "label": "Hamlet: Act 1, Scene 1, p152",
    "body": {
        "id": "nn4v",
        "format": "application/tei+xml",
        "source": "http://firstfolio.bodleian.ox.ac.uk/download/xml/F-ham.xml",
        "selector": {
            "type": "RangeSelector",
            "startSelector": {
                "type": "XPathSelector",
                "value": "//pb[@n='152']"
            },
            "endSelector": {
                "type": "XPathSelector",
                "value": "//pb[@n='153']"
            }
        }
    },
    "target": "http://iiif.bodleian.ox.ac.uk/iiif/image/e6ad69d4-9b90-4afc-a32d-d4af0889f1b8/full/full/0/default.jpg"
}


Hope this would be relevant to the discussion.

best regards,

[1] http://www.kanzaki.com/works/ld/firstfolio
[2] http://firstfolio.bodleian.ox.ac.uk/

2017-06-27 21:26 GMT+09:00 Stutzmann Dominique
<[hidden email]>:

> Dear Peter, Georg, and all,
>
> since Georg invited me to contribute, here I share on the different,
> connected issues , esp. format (TEI, PAGE, IIIF), granularity, text and
> image, text as annotation and its visualisation, and software engineering.
>
>
> 1) Formats for linking text and image
>
> a) TEI and annotation coordinates
>
> In several projects, colleagues from linguistics and paleaography, including
> Alexei Lavrentiev and me, have felt the need to link closely images and
> (analyzed) text at the levels of words and characters.
>
> This led us to specify a stand-off TEI format to deal with <facsimile> and
> <text>. The format developped in the Oriflamms project is here (described in
> French):
> (part 1) http://oriflamms.hypotheses.org/1442
> (part 2)  http://oriflamms.hypotheses.org/1510
>
> The main principles are that
> - the texts are encoded in TEI (or teiCorpus > TEI) > text and tokenized
> with <c> and <w> with @xml:id
> - the corresponding <facsimile> and <zone> declarations are in separate
> files, with @xml:id
> - there is one file per image linking the @xml:id from the textual content
> (at character, word, line, column, page level) to the graphic content.
>
> These <facsimile> declarations are stored in one file per image (in a
> distinct folder) and we create <zone> elements for page (as stated in the
> discussion, you may have several pages reproduced on one image), columns,
> line, word and character or punctuation. A word can cross the
> page/column/line break. A character can cross the word separation (it is
> quite rare, but it happens, e.g. st ligature across two words).
>
> Several corpora in this format are on the project's GitHub instance:
> https://github.com/oriflamms (start with a
> https://github.com/oriflamms/Test_Fontenay/).
>
>
> b) PAGE and TEI
>
> The PAGE format is dedicated to describing what is on a page. It does have
> some structured information that TEI cannot render in the same structured
> way. This facts naturally derives from an image-oriented format vs. a
> text-oriented one. The description of the layout has a very fine level of
> granularity in PAGE:  for example there are attributes @colourDepth or
> @bgColour to give the information about "The colour bit depth required for
> the region" or "The background colour of the region".
> One can transfer this type of information in TEI, but often either in an non
> structured or in an non explicit way.
>
> For example PAGE @indented in RegionType may correspond to:
> - @rend="indent" at different possible levels : level of layout (<pb/>,
> <cb/> but applying @rend='indent' at these elements would be an abuse),
> level of textual analysis (<p> or <l>), or in a more neutral way: PAGE is
> neutral and does not provide any analysis, so converting a block should
> create <ab> rather than <p>.
> - if there is a fully aligned text: shorter lines at the beginning of
> paragraphs in TEI text + facsimile
>
> The main difference, from an intellectual perspective, is that PAGE is used
> to store data from HTR or OCR. So, any part of "understanding" has to be
> encoded additionally. For example, the order of reading, which is implicit
> in TEI (<ReadingOrder>).
>
> As a matter of fact, in all instances of PAGE files that I have seen, there
> was no information which we could not transfer straightforwardly from one
> format to the other without using unstructured <desc> elements.This could
> require additions to TEI (while remembering what "T" stands for in "TEI").
>
> c) IIIF and TEI
> As evidenced in Ben Brumfield's excellent contribution, one might find hard
> to see the upside of a very verbose format to store very small bits of
> information without being able to encode them in the full meaning of the
> word, that is to analyze it. One strength is that IIIF (as PAGE) allows to
> make the order of text elements (via annotations) explicit.
>
> 2) Granularity and big data: consequences for alignment and visualization
>
> As mentioned above, my colleagues and I are working on the text as image at
> word and character level. This level of granularity has a consequence on
> format and on the software we use. Indeed, annotating several hundreds of
> thousands characters manually is very time consuming and also challenging,
> and the coordinates of the zones have to be modified. The Oriflamms format
> described above does not make use of @facs but only uses @xml:id and links.
> It helps keeping the textual analysis in one file and the graphic analysis
> in other files and systems.
>
> The consequence for the software engineering is that we have
> - a routine on the software TXM to prepare a corpus from an edited text. At
> this stage, this routine requires a @facs on <pb/> to indicate the beginning
> of a new image, but you then can have several pages on one image.
> - a software to align TEI encoded texts with images and visualize, correct
> and validate the results in a linear and in a tabular form.
> This software is open source : https://github.com/Liris-Pleiad/oriflamms
> There is an .exe version .
>
> In the more recent European project HIMANIS, we have used TEI editions to
> nourish HTR (Handwritten Text Recognition) systems and provide our textual
> community with an indexed corpus of 147 medieval manuscripts in Latin and
> French. The result is a giant index in which you can search and set
> parameters about word confidence. Each word region on an image may be
> indexed with several recognition hypotheses (typically ten), each having a
> confidence level.
> If your haven't seen it yet, please have a look at (search engine)
> http://prhlt-kws.prhlt.upv.es/himanis/ and  (instructions)
> http://himanis.hypotheses.org/105 (and please, don't forget to validate or
> reject the hits that are found). The project is not finished and we will add
> a lot of things, but, by now, you can search for words and spot them with
> coordinates on the image.
> From a data modeling point of view, the results of Key Word Spotting is
> typically an "annotation" in the sense of IIIF. There is no stringent
> reading order (in the worflow, there is a line recognition step and key
> words are typically spotted on the lines, but having a false line
> segmentation does not prevent key word spotting from being accurate). For
> "sequence search", we assume a top to bottom, left to right reading order,
> and "graphic proximity" on the page, but this is not a "phrase" search. User
> feedback on KWS results is annotation on annotations. The beta-Interface
> does not provide a visualisation of all annotations, but IIIF, despite being
> very verbose, would be a "natural" format to exchange those annotations on
> images.
>
> References:
> (on the software)
> http://ieeexplore.ieee.org/document/6981046/?reload=true&arnumber=6981046
> (on the purpose of the research)
> https://www.cairn.info/revue-document-numerique-2013-3-page-81.htm
> (on the alignment)
> http://dh2015.org/abstracts/xml/STUTZMANN_Dominique_From_Text_and_Image_to_Histor/STUTZMANN_Dominique_From_Text_and_Image_to_Historical_R.html
>
> 3) Data conversion and software
>
> A software like Transkibus https://transkribus.eu/Transkribus/, developped
> by READ, to which the University of Valencia (Spain) is partner, can deal
> with PAGE format and TEI in a very effective way and export correctly from
> one format to another.
>
> Implementing IIIF to visualize all annotations is an obvious target. But,
> going back to the discussion, we also wish to provide one linear transcript
> (the string build from the sequence of the hypothese with best confidence),
> reintegrated into TEI to allow for correction, validation and semantic
> encoding, representing the text as a text and not only in a conundrum of 40
> automated annotations with confidence level plus one or several annotations
> from (human) scholars or users, with or without reading order. From a
> logical perspective, it is not the same to identify let's say a quote in the
> text and to mark a sequence of canvas as being a quote. To me, a sequence of
> canvas is not a meaning, it is a graphic content that can be read.
>
> With some of the same partners plus the Library of Poitiers and Teklia, and
> with some funding from Biblissima, we want to make our developments in HTR
> IIIF-compliant. That is to produce text transcriptions (from HTR) in TEI
> format, as said above with <text>, <facsimile> and /links/ (even if the
> result is less efficient than in PAGE or IIIF for that matter, because it
> opens the way to linguistic and paleographic analysis), then publishing it
> as manifest and annotations for IIIF (each transcribed word or character is
> the content of an "annotation" on a particular "canvas" on the image), using
> IIIF API to present the results and collect feedback and corrections,
> keeping the needed ids and then re-nesting the results into TEI files
> (tokenized at word/character level).
>
> So, wrapping up, both in TEI and IIIF there are ways to link at a finer
> level of granularity than the page. If I understand correctly, one probably
> could do everything in IIIF that has been done in TEI, by annotating a
> sequence of annotation at word or character level and marking this sequence
> as being "tei:p" or "tei:l" etc. , but I am not sure that it would be a
> great benefit for one community or the other. In our projects, we had both
> directions: starting from TEI editions to create data on images and starting
> from image analysis to create textual content, and, as a paleographer,
> working on both text and image, I really am convinced by the need to have
> text analysis as well as annotations. The proposed strategy would be to use
> each format for what it is the most useful and to implement automated
> mechanisms to let our formats communicate in a seamless way, working at the
> finer level of granularity probably makes it easier.
>
> Best regards,
> Dominique
>
>
> Le Samedi 24 juin 2017 20h00, "Robinson, Peter" <[hidden email]> a
> écrit :
>
>
> Time for a little context, I think.
>
> The IIIF community is large, growing, and multifaceted (sound familiar,
> anyone?). For some time now, several of us (beginning with Domhnall Ó
> h’Éigheartaigh, Patrick Cuba, myself and various others) have been looking
> at how IIIF and complex texts might play together. This group now includes
> (among others) John Bryant, Ben Brumfield (whose post on this list sparked
> this discussion), Jeffrey Witt, John Howard, Rafaelle Vigilante and Nick
> Laicona. Many of us were at the recent IIIF conference in Rome, where we
> presented a series of ruminations on the potential (great!), technical
> issues (multiple) and possible strategies (far too many) on how we might
> link complicated texts, typically referencing information extending far
> beyond the page-based model of IIIF, with IIIF.
>
> No firm answers yet. Anyone who wants to join our group as we wrestle with
> all this, please email any one of us (in the distribution list on this
> email). I can imagine that some time in the future (the November TEI members
> meeting?) the TEI itself might want to look at linkages betwixt TEI and
> IIIF.
>
> Peter
>
> On Jun 23, 2017, at 9:38 PM, Christian-Emil Smith Ore <[hidden email]>
> wrote:
>>
>>
>> Hi
>> We looking for a decent viewer for  the facsimiles of Henrik Ibsens
>> manuscripts where all texts ar transcribed as TEI xml-documetns by the large
>> project Henrik Ibsen's writings mostly in the 1990s. It is clear that
>> something like the universal viewer (http://universalviewer.io/)  may do the
>> job. This is a IIIF thing. I studied the specification of IIIF before I read
>> Ben’s report from the IIIF. It is an easy match to view the facsimiles, but
>> it is harder to add advanced (meta)data outside the simple
>> open-annotation-universe. I read Ben’s restored Vatican talk and also the
>> notes indicating Peter Robinson’s view. A text is not a series of pages. In
>> any case I assume that it is easy to link from some viewer of
>> tei-xml-encoded text to an instance of the universal viewer, but may be not
>> so easy the other way round. The question is whether the data model in IIIF
>> is well suited for modelling texts in the way TEI recommend.
>> I will be interested in participating in a discussion about this.
>>
>> Best
>> Christian-Emil
>>
>> _____________________
>> From: TEI (Text Encoding Initiative) public discussion list
>> <[hidden email]> on behalf of Martin Holmes <[hidden email]>
>> Sent: 18 June 2017 19:27
>> To: [hidden email]
>> Subject: Re: IIIF and facs
>>
>> Hi Ben,
>>
>> I'd say there's a great deal more you can do than simply using pb/@facs
>> to point at the highest-res image; the Representation of Primary Sources
>> chapter has examples of using <surface> and <zone> to link components of
>> a transcription to areas on an image, and of linking to multiple images
>> at different resolutions:
>>
>> <facsimile>
>>  <graphic url="page1.png"/>
>>  <surface>
>>  <graphic url="page2-highRes.png"/>
>>  <graphic url="page2-lowRes.png"/>
>>  </surface>
>>  <graphic url="page3.png"/>
>>  <graphic url="page4.png"/>
>> </facsimile>
>>
>> <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHFAX>
>>
>> Cheers,
>> Martin
>>
>> On 2017-06-18 04:24 AM, Ben Brumfield wrote:
>>> Dear Colleagues,
>>>
>>> Two weeks ago, Patrick Cuba, John Howard, Peter Robinson, Jeffrey Witt
>>> and I organized a discussion session on Connecting Text and IIIF at the
>>> IIIF Conference at the Vatican.  While we each have different
>>> perspectives expressed by our lightning talks, we agree on the need for
>>> the TEI community to be involved in conversations about modeling text in
>>> IIIF.
>>>
>>> My own talk, "Text Beyond Annotations" is online at
>>> http://content.fromthepage.com/text-beyond-annotations-at-iiif-vatican/
>>>
>>> I'd be interested in discussing best practices for linking from TEI
>>> documents to page facsimiles hosted on IIIF image services.  At the
>>> moment I think that the only option we have is to insert a URL to a
>>> maximum-resolution image into the *facs* element of *pb*.  I'd like to
>>> preserve that option for TEI viewers that don't support IIIF, but is
>>> there anything better we could do?
>>>
>>> Ben
>>>
>>> Ben W. Brumfield
>>> Partner, Brumfield Labs
>>> Creators of FromThePage <https://fromthepage.com/>
>
>
>



--
@prefix : <http://www.kanzaki.com/ns/sig#> . <> :from [:name
"KANZAKI Masahide"; :nick "masaka"; :email "[hidden email]"].
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs

Peter Flynn-8
In reply to this post by Ben Brumfield
On 06/18/2017 12:24 PM, Ben Brumfield wrote:
[...]
> I'd be interested in discussing best practices for linking from TEI
> documents to page facsimiles hosted on IIIF image services.  At the
> moment I think that the only option we have is to insert a URL to a
> maximum-resolution image into the *facs* element of *pb*.  I'd like
> to preserve that option for TEI viewers that don't support IIIF, but
> is there anything better we could do?

I seem to have missed or misunderstood something in the ensuing
discussion. Admittedly, I am looking at this from the point of view of
an implementer, not an encoder, so I have have a different focus.

I am assuming that:

a) an image-set for a document is on a server;
b) each page-image is addressable by a unique URI;
c) the URI uses some kind of counting-token for each page,
   eg page number, folio, sheet, frame, etc;
d) this token is part of the accepted scheme scholars use
   for this document.

It is (IMHO) the business of the encoder to ensure that the relevant
milestones recognised by the user community as the canonical reference
method for each document are included in the TEI markup for the
document, so that users can find out where they are.

Then the technology (eg XSLT) that serves up search results can
trivially locate preceding::mls[1] or preceding::pb[1] or whatever for
any given hit, and form the URI for the image that by definition will
include the location in question.

This separates the two mechanisms, allowing the adoption of different
server techologies on either side in the future with minimal recoding.

It does, however, depend on the encoding of the canonical reference
(milestone) data for the document, and there are of course documents
with more than one such reference method, and many with none at all (but
presumably at least do have page numbers or folios; scrolls are a
different problem); and it depends on the creator of the image-set doing
the same.

Do those two criteria present particular difficulties where IIIF image
hosting is concerned?

///Peter
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs (and TEI)

Ben Brumfield
In reply to this post by Christian-Emil Ore-4
I'm delighted to see the interest from the TEI community in connecting/converting IIIF, TEI and related formats like PAGE, and have been following the discussion with interest.

I'd like to return to a more tactical question about TEI and the IIIF Image API.  While TEI zones correspond well to IIIF regions, neither standard really requires us to use such subsets of a page image, as facs can point to a whole page image, and a IIIF canvas's image resource generally will display the entire page of a manuscript.  I'd like to know more about what I suspect will be the most common case -- associating a page transcript with a page facsimile using facs to point to a IIIF-hosted document.

We can certainly point our facs attributes at a IIIF-compliant URL, but how do we indicate to a IIIF-aware TEI viewer that there is a IIIF image endpoint which can be used for deep zooming by a client like OpenSeadragon?  I gather that the value of a facs attribute can refer to nearly anything, and need not be a URL.  Is there a way to add IIIF-specific data to facs?  Should that be better addressed by another attribute on pb?

I'm imagining that something basic like <pb facs="$ENDPOINT/full/full/0/default.jpg"> (which would work for a viewer unaware of IIIF) could be expanded along the lines of
<pb facs="$ENDPOINT/full/full/0/default.jpg" iiif="$ENDPOINT"> or perhaps <pb facs="$ENDPOINT/full/full/0/default.jpg; iiif=$ENDPOINT"> but this is really new territory for me, and could use advice on existing practice.

Thanks,

Ben
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs (and TEI)

Martin Mueller
An interesting thread. A good opportunity for someone with larger technical chops than I possess to write a digest along the lines of TEI and IIIF in 2017: the State of the Art. I know next to nothing about the underlying technologies, but I sense from conversations with librarians that things are on the cusp of moving. So quite a few readers of this list might appreciate a digest of this thread.




On 6/28/17, 8:47 PM, "TEI (Text Encoding Initiative) public discussion list on behalf of Ben Brumfield" <[hidden email] on behalf of [hidden email]> wrote:

>I'm delighted to see the interest from the TEI community in connecting/converting IIIF, TEI and related formats like PAGE, and have been following the discussion with interest.
>
>I'd like to return to a more tactical question about TEI and the IIIF Image API.  While TEI zones correspond well to IIIF regions, neither standard really requires us to use such subsets of a page image, as facs can point to a whole page image, and a IIIF canvas's image resource generally will display the entire page of a manuscript.  I'd like to know more about what I suspect will be the most common case -- associating a page transcript with a page facsimile using facs to point to a IIIF-hosted document.
>
>We can certainly point our facs attributes at a IIIF-compliant URL, but how do we indicate to a IIIF-aware TEI viewer that there is a IIIF image endpoint which can be used for deep zooming by a client like OpenSeadragon?  I gather that the value of a facs attribute can refer to nearly anything, and need not be a URL.  Is there a way to add IIIF-specific data to facs?  Should that be better addressed by another attribute on pb?
>
>I'm imagining that something basic like <pb facs="$ENDPOINT/full/full/0/default.jpg"> (which would work for a viewer unaware of IIIF) could be expanded along the lines of
><pb facs="$ENDPOINT/full/full/0/default.jpg" iiif="$ENDPOINT"> or perhaps <pb facs="$ENDPOINT/full/full/0/default.jpg; iiif=$ENDPOINT"> but this is really new territory for me, and could use advice on existing practice.
>
>Thanks,
>
>Ben
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs (and TEI)

Lou Burnard-6
In reply to this post by Ben Brumfield
On 29/06/17 02:47, Ben Brumfield wrote:
>   I gather that the value of a facs attribute can refer to nearly anything, and need not be a URL.  Is there a way to add IIIF-specific data to facs?  Should that be better addressed by another attribute on pb?

The value of a @facs attribute must conform to the W3C URI syntax. So it
can be any kind of pointer. I am not sure what you mean by
"IIIF-specific data" but if it can be expressed as a URI (which I would
be surprised if not) then you're home and dry.
Of course you're not doing the non-IIIF user any favours by using
something which is entirely meaningless except to IIIF-aware processors.
The TEI prefix definition mechanism might perhaps help here: you could
do <pb facs="IIIF:image.jpg"/> throughout your document and then define
a mapping between the string "IIF" and a full URL using &lt;prefixDef>


> I'm imagining that something basic like <pb facs="$ENDPOINT/full/full/0/default.jpg"> (which would work for a viewer unaware of IIIF) could be expanded along the lines of
> <pb facs="$ENDPOINT/full/full/0/default.jpg" iiif="$ENDPOINT"> or perhaps <pb facs="$ENDPOINT/full/full/0/default.jpg; iiif=$ENDPOINT"> but this is really new territory for me, and could use advice on existing practice.

Not sure I understand your examples here. The second two seem identical.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

IIIF and facs

Jeffrey Witt
In reply to this post by Ben Brumfield
Hi All,

Just to add a few more data points, here’s how we currently do things at the SCTA (http://scta.info).

First, here’s a short write up of some modelling decisions: http://lombardpress.org/2016/08/09/surfaces-canvases-and-zones/ 

This post is basically a description of why I separate the concept of a Manifestation Surface from an Item Surface from a IIIF Canvas, and then how I link them together.

Then, at the TEI level, our editors basically embed the Manifestation Surface Id into the Milestone elements, like so: <pb ed="#S" n="2-r"/> and <surface n="2-r">

All surfaces get recorded as RDF Resources that can be de-referenced by a client using the information embedded in the TEI as seen above. See for example http://scta.info/resource/sorb/2r.

In the end, this means that when a user is reading a TEI text, a click event on a milestone element, for example, triggers a request to the RDF triple store for the corresponding surface, and from here the SPARQL query looks for the default ISurface and from here the default Canvas ID. Since, at present, in the IIIF world we cannot rely on canvases themselves to be “de-referenceable”, I ingest all canvas information into the triple store as well. Thus, the SPARQL query continues from the Canvas ID to the Service ID of the image itself. It then retrieves and displays the actual image.

See for example: http://scta.lombardpress.org/text/lectio1. Click on the link “S2ra” (or any other folio marker) and you should see the corresponding image appear on your screen retrieved from distributed libraries all over the world via IIIF. If you select the paragraph menu at the end of any paragraph and then select “Manuscript Images”, the same query is happening, but this time coordinate regions of the target zone are being used to request only the desired region of interest from the IIIF server. (This coordinate information is originally stored in the surface element in the TEI header, but gets converted to RDF when the text gets crawled and aggregated.)

That, at least, is what I currently do (

Thoughts and questions welcome.

Jeff Witt


On 6/29/17, 12:00 AM, "TEI (Text Encoding Initiative) public discussion list on behalf of TEI-L automatic digest system" <[hidden email] on behalf of [hidden email]> wrote:

    There are 4 messages totaling 565 lines in this issue.
   
    Topics of the day:
   
      1. IIIF and facs (and TEI) (3)
      2. IIIF and facs
   
    ----------------------------------------------------------------------
   
    Date:    Wed, 28 Jun 2017 22:11:35 +0900
    From:    KANZAKI Masahide <[hidden email]>
    Subject: Re: IIIF and facs (and TEI)
   
    Hello all,
   
    I have a small experiment that connects TEI/XML data and images via
    IIIF, which might be of your interest.
   
    Linked First Folio [1] allows users to search words in Shakespeare's
    plays and to reach a page that contains the results as well as
    facsimile image of the page. It utilizes TEI and IIIF from the
    Bodleian First Folio [2] to associate a word/phrase in XML and the
    page image.
   
    It does not directly use facs attr values in Bodleian TEI. Instead, it
    uses pre-generated mapping between a page range in TEI and an image
    resource in IIIF manifest as an Web Annotation, e.g.
   
    {
        "id": "p152",
        "type": "Annotation",
        "label": "Hamlet: Act 1, Scene 1, p152",
        "body": {
            "id": "nn4v",
            "format": "application/tei+xml",
            "source": "http://firstfolio.bodleian.ox.ac.uk/download/xml/F-ham.xml",
            "selector": {
                "type": "RangeSelector",
                "startSelector": {
                    "type": "XPathSelector",
                    "value": "//pb[@n='152']"
                },
                "endSelector": {
                    "type": "XPathSelector",
                    "value": "//pb[@n='153']"
                }
            }
        },
        "target": "http://iiif.bodleian.ox.ac.uk/iiif/image/e6ad69d4-9b90-4afc-a32d-d4af0889f1b8/full/full/0/default.jpg"
    }
   
   
    Hope this would be relevant to the discussion.
   
    best regards,
   
    [1] http://www.kanzaki.com/works/ld/firstfolio
    [2] http://firstfolio.bodleian.ox.ac.uk/
   
    2017-06-27 21:26 GMT+09:00 Stutzmann Dominique
    <[hidden email]>:
    > Dear Peter, Georg, and all,
    >
    > since Georg invited me to contribute, here I share on the different,
    > connected issues , esp. format (TEI, PAGE, IIIF), granularity, text and
    > image, text as annotation and its visualisation, and software engineering.
    >
    >
    > 1) Formats for linking text and image
    >
    > a) TEI and annotation coordinates
    >
    > In several projects, colleagues from linguistics and paleaography, including
    > Alexei Lavrentiev and me, have felt the need to link closely images and
    > (analyzed) text at the levels of words and characters.
    >
    > This led us to specify a stand-off TEI format to deal with <facsimile> and
    > <text>. The format developped in the Oriflamms project is here (described in
    > French):
    > (part 1) http://oriflamms.hypotheses.org/1442
    > (part 2)  http://oriflamms.hypotheses.org/1510
    >
    > The main principles are that
    > - the texts are encoded in TEI (or teiCorpus > TEI) > text and tokenized
    > with <c> and <w> with @xml:id
    > - the corresponding <facsimile> and <zone> declarations are in separate
    > files, with @xml:id
    > - there is one file per image linking the @xml:id from the textual content
    > (at character, word, line, column, page level) to the graphic content.
    >
    > These <facsimile> declarations are stored in one file per image (in a
    > distinct folder) and we create <zone> elements for page (as stated in the
    > discussion, you may have several pages reproduced on one image), columns,
    > line, word and character or punctuation. A word can cross the
    > page/column/line break. A character can cross the word separation (it is
    > quite rare, but it happens, e.g. st ligature across two words).
    >
    > Several corpora in this format are on the project's GitHub instance:
    > https://github.com/oriflamms (start with a
    > https://github.com/oriflamms/Test_Fontenay/).
    >
    >
    > b) PAGE and TEI
    >
    > The PAGE format is dedicated to describing what is on a page. It does have
    > some structured information that TEI cannot render in the same structured
    > way. This facts naturally derives from an image-oriented format vs. a
    > text-oriented one. The description of the layout has a very fine level of
    > granularity in PAGE:  for example there are attributes @colourDepth or
    > @bgColour to give the information about "The colour bit depth required for
    > the region" or "The background colour of the region".
    > One can transfer this type of information in TEI, but often either in an non
    > structured or in an non explicit way.
    >
    > For example PAGE @indented in RegionType may correspond to:
    > - @rend="indent" at different possible levels : level of layout (<pb/>,
    > <cb/> but applying @rend='indent' at these elements would be an abuse),
    > level of textual analysis (<p> or <l>), or in a more neutral way: PAGE is
    > neutral and does not provide any analysis, so converting a block should
    > create <ab> rather than <p>.
    > - if there is a fully aligned text: shorter lines at the beginning of
    > paragraphs in TEI text + facsimile
    >
    > The main difference, from an intellectual perspective, is that PAGE is used
    > to store data from HTR or OCR. So, any part of "understanding" has to be
    > encoded additionally. For example, the order of reading, which is implicit
    > in TEI (<ReadingOrder>).
    >
    > As a matter of fact, in all instances of PAGE files that I have seen, there
    > was no information which we could not transfer straightforwardly from one
    > format to the other without using unstructured <desc> elements.This could
    > require additions to TEI (while remembering what "T" stands for in "TEI").
    >
    > c) IIIF and TEI
    > As evidenced in Ben Brumfield's excellent contribution, one might find hard
    > to see the upside of a very verbose format to store very small bits of
    > information without being able to encode them in the full meaning of the
    > word, that is to analyze it. One strength is that IIIF (as PAGE) allows to
    > make the order of text elements (via annotations) explicit.
    >
    > 2) Granularity and big data: consequences for alignment and visualization
    >
    > As mentioned above, my colleagues and I are working on the text as image at
    > word and character level. This level of granularity has a consequence on
    > format and on the software we use. Indeed, annotating several hundreds of
    > thousands characters manually is very time consuming and also challenging,
    > and the coordinates of the zones have to be modified. The Oriflamms format
    > described above does not make use of @facs but only uses @xml:id and links.
    > It helps keeping the textual analysis in one file and the graphic analysis
    > in other files and systems.
    >
    > The consequence for the software engineering is that we have
    > - a routine on the software TXM to prepare a corpus from an edited text. At
    > this stage, this routine requires a @facs on <pb/> to indicate the beginning
    > of a new image, but you then can have several pages on one image.
    > - a software to align TEI encoded texts with images and visualize, correct
    > and validate the results in a linear and in a tabular form.
    > This software is open source : https://github.com/Liris-Pleiad/oriflamms
    > There is an .exe version .
    >
    > In the more recent European project HIMANIS, we have used TEI editions to
    > nourish HTR (Handwritten Text Recognition) systems and provide our textual
    > community with an indexed corpus of 147 medieval manuscripts in Latin and
    > French. The result is a giant index in which you can search and set
    > parameters about word confidence. Each word region on an image may be
    > indexed with several recognition hypotheses (typically ten), each having a
    > confidence level.
    > If your haven't seen it yet, please have a look at (search engine)
    > http://prhlt-kws.prhlt.upv.es/himanis/ and  (instructions)
    > http://himanis.hypotheses.org/105 (and please, don't forget to validate or
    > reject the hits that are found). The project is not finished and we will add
    > a lot of things, but, by now, you can search for words and spot them with
    > coordinates on the image.
    > From a data modeling point of view, the results of Key Word Spotting is
    > typically an "annotation" in the sense of IIIF. There is no stringent
    > reading order (in the worflow, there is a line recognition step and key
    > words are typically spotted on the lines, but having a false line
    > segmentation does not prevent key word spotting from being accurate). For
    > "sequence search", we assume a top to bottom, left to right reading order,
    > and "graphic proximity" on the page, but this is not a "phrase" search. User
    > feedback on KWS results is annotation on annotations. The beta-Interface
    > does not provide a visualisation of all annotations, but IIIF, despite being
    > very verbose, would be a "natural" format to exchange those annotations on
    > images.
    >
    > References:
    > (on the software)
    > http://ieeexplore.ieee.org/document/6981046/?reload=true&arnumber=6981046
    > (on the purpose of the research)
    > https://www.cairn.info/revue-document-numerique-2013-3-page-81.htm
    > (on the alignment)
    > http://dh2015.org/abstracts/xml/STUTZMANN_Dominique_From_Text_and_Image_to_Histor/STUTZMANN_Dominique_From_Text_and_Image_to_Historical_R.html
    >
    > 3) Data conversion and software
    >
    > A software like Transkibus https://transkribus.eu/Transkribus/, developped
    > by READ, to which the University of Valencia (Spain) is partner, can deal
    > with PAGE format and TEI in a very effective way and export correctly from
    > one format to another.
    >
    > Implementing IIIF to visualize all annotations is an obvious target. But,
    > going back to the discussion, we also wish to provide one linear transcript
    > (the string build from the sequence of the hypothese with best confidence),
    > reintegrated into TEI to allow for correction, validation and semantic
    > encoding, representing the text as a text and not only in a conundrum of 40
    > automated annotations with confidence level plus one or several annotations
    > from (human) scholars or users, with or without reading order. From a
    > logical perspective, it is not the same to identify let's say a quote in the
    > text and to mark a sequence of canvas as being a quote. To me, a sequence of
    > canvas is not a meaning, it is a graphic content that can be read.
    >
    > With some of the same partners plus the Library of Poitiers and Teklia, and
    > with some funding from Biblissima, we want to make our developments in HTR
    > IIIF-compliant. That is to produce text transcriptions (from HTR) in TEI
    > format, as said above with <text>, <facsimile> and /links/ (even if the
    > result is less efficient than in PAGE or IIIF for that matter, because it
    > opens the way to linguistic and paleographic analysis), then publishing it
    > as manifest and annotations for IIIF (each transcribed word or character is
    > the content of an "annotation" on a particular "canvas" on the image), using
    > IIIF API to present the results and collect feedback and corrections,
    > keeping the needed ids and then re-nesting the results into TEI files
    > (tokenized at word/character level).
    >
    > So, wrapping up, both in TEI and IIIF there are ways to link at a finer
    > level of granularity than the page. If I understand correctly, one probably
    > could do everything in IIIF that has been done in TEI, by annotating a
    > sequence of annotation at word or character level and marking this sequence
    > as being "tei:p" or "tei:l" etc. , but I am not sure that it would be a
    > great benefit for one community or the other. In our projects, we had both
    > directions: starting from TEI editions to create data on images and starting
    > from image analysis to create textual content, and, as a paleographer,
    > working on both text and image, I really am convinced by the need to have
    > text analysis as well as annotations. The proposed strategy would be to use
    > each format for what it is the most useful and to implement automated
    > mechanisms to let our formats communicate in a seamless way, working at the
    > finer level of granularity probably makes it easier.
    >
    > Best regards,
    > Dominique
    >
    >
    > Le Samedi 24 juin 2017 20h00, "Robinson, Peter" <[hidden email]> a
    > écrit :
    >
    >
    > Time for a little context, I think.
    >
    > The IIIF community is large, growing, and multifaceted (sound familiar,
    > anyone?). For some time now, several of us (beginning with Domhnall Ó
    > h’Éigheartaigh, Patrick Cuba, myself and various others) have been looking
    > at how IIIF and complex texts might play together. This group now includes
    > (among others) John Bryant, Ben Brumfield (whose post on this list sparked
    > this discussion), Jeffrey Witt, John Howard, Rafaelle Vigilante and Nick
    > Laicona. Many of us were at the recent IIIF conference in Rome, where we
    > presented a series of ruminations on the potential (great!), technical
    > issues (multiple) and possible strategies (far too many) on how we might
    > link complicated texts, typically referencing information extending far
    > beyond the page-based model of IIIF, with IIIF.
    >
    > No firm answers yet. Anyone who wants to join our group as we wrestle with
    > all this, please email any one of us (in the distribution list on this
    > email). I can imagine that some time in the future (the November TEI members
    > meeting?) the TEI itself might want to look at linkages betwixt TEI and
    > IIIF.
    >
    > Peter
    >
    > On Jun 23, 2017, at 9:38 PM, Christian-Emil Smith Ore <[hidden email]>
    > wrote:
    >>
    >>
    >> Hi
    >> We looking for a decent viewer for  the facsimiles of Henrik Ibsens
    >> manuscripts where all texts ar transcribed as TEI xml-documetns by the large
    >> project Henrik Ibsen's writings mostly in the 1990s. It is clear that
    >> something like the universal viewer (http://universalviewer.io/)  may do the
    >> job. This is a IIIF thing. I studied the specification of IIIF before I read
    >> Ben’s report from the IIIF. It is an easy match to view the facsimiles, but
    >> it is harder to add advanced (meta)data outside the simple
    >> open-annotation-universe. I read Ben’s restored Vatican talk and also the
    >> notes indicating Peter Robinson’s view. A text is not a series of pages. In
    >> any case I assume that it is easy to link from some viewer of
    >> tei-xml-encoded text to an instance of the universal viewer, but may be not
    >> so easy the other way round. The question is whether the data model in IIIF
    >> is well suited for modelling texts in the way TEI recommend.
    >> I will be interested in participating in a discussion about this.
    >>
    >> Best
    >> Christian-Emil
    >>
    >> _____________________
    >> From: TEI (Text Encoding Initiative) public discussion list
    >> <[hidden email]> on behalf of Martin Holmes <[hidden email]>
    >> Sent: 18 June 2017 19:27
    >> To: [hidden email]
    >> Subject: Re: IIIF and facs
    >>
    >> Hi Ben,
    >>
    >> I'd say there's a great deal more you can do than simply using pb/@facs
    >> to point at the highest-res image; the Representation of Primary Sources
    >> chapter has examples of using <surface> and <zone> to link components of
    >> a transcription to areas on an image, and of linking to multiple images
    >> at different resolutions:
    >>
    >> <facsimile>
    >>  <graphic url="page1.png"/>
    >>  <surface>
    >>  <graphic url="page2-highRes.png"/>
    >>  <graphic url="page2-lowRes.png"/>
    >>  </surface>
    >>  <graphic url="page3.png"/>
    >>  <graphic url="page4.png"/>
    >> </facsimile>
    >>
    >> <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHFAX>
    >>
    >> Cheers,
    >> Martin
    >>
    >> On 2017-06-18 04:24 AM, Ben Brumfield wrote:
    >>> Dear Colleagues,
    >>>
    >>> Two weeks ago, Patrick Cuba, John Howard, Peter Robinson, Jeffrey Witt
    >>> and I organized a discussion session on Connecting Text and IIIF at the
    >>> IIIF Conference at the Vatican.  While we each have different
    >>> perspectives expressed by our lightning talks, we agree on the need for
    >>> the TEI community to be involved in conversations about modeling text in
    >>> IIIF.
    >>>
    >>> My own talk, "Text Beyond Annotations" is online at
    >>> http://content.fromthepage.com/text-beyond-annotations-at-iiif-vatican/
    >>>
    >>> I'd be interested in discussing best practices for linking from TEI
    >>> documents to page facsimiles hosted on IIIF image services.  At the
    >>> moment I think that the only option we have is to insert a URL to a
    >>> maximum-resolution image into the *facs* element of *pb*.  I'd like to
    >>> preserve that option for TEI viewers that don't support IIIF, but is
    >>> there anything better we could do?
    >>>
    >>> Ben
    >>>
    >>> Ben W. Brumfield
    >>> Partner, Brumfield Labs
    >>> Creators of FromThePage <https://fromthepage.com/>
    >
    >
    >
   
   
   
    --
    @prefix : <http://www.kanzaki.com/ns/sig#> . <> :from [:name
    "KANZAKI Masahide"; :nick "masaka"; :email "[hidden email]"].
   
    ------------------------------
   
    Date:    Wed, 28 Jun 2017 23:26:14 +0100
    From:    Peter Flynn <[hidden email]>
    Subject: Re: IIIF and facs
   
    On 06/18/2017 12:24 PM, Ben Brumfield wrote:
    [...]
    > I'd be interested in discussing best practices for linking from TEI
    > documents to page facsimiles hosted on IIIF image services.  At the
    > moment I think that the only option we have is to insert a URL to a
    > maximum-resolution image into the *facs* element of *pb*.  I'd like
    > to preserve that option for TEI viewers that don't support IIIF, but
    > is there anything better we could do?
   
    I seem to have missed or misunderstood something in the ensuing
    discussion. Admittedly, I am looking at this from the point of view of
    an implementer, not an encoder, so I have have a different focus.
   
    I am assuming that:
   
    a) an image-set for a document is on a server;
    b) each page-image is addressable by a unique URI;
    c) the URI uses some kind of counting-token for each page,
       eg page number, folio, sheet, frame, etc;
    d) this token is part of the accepted scheme scholars use
       for this document.
   
    It is (IMHO) the business of the encoder to ensure that the relevant
    milestones recognised by the user community as the canonical reference
    method for each document are included in the TEI markup for the
    document, so that users can find out where they are.
   
    Then the technology (eg XSLT) that serves up search results can
    trivially locate preceding::mls[1] or preceding::pb[1] or whatever for
    any given hit, and form the URI for the image that by definition will
    include the location in question.
   
    This separates the two mechanisms, allowing the adoption of different
    server techologies on either side in the future with minimal recoding.
   
    It does, however, depend on the encoding of the canonical reference
    (milestone) data for the document, and there are of course documents
    with more than one such reference method, and many with none at all (but
    presumably at least do have page numbers or folios; scrolls are a
    different problem); and it depends on the creator of the image-set doing
    the same.
   
    Do those two criteria present particular difficulties where IIIF image
    hosting is concerned?
   
    ///Peter
   
    ------------------------------
   
    Date:    Wed, 28 Jun 2017 21:47:27 -0400
    From:    Ben Brumfield <[hidden email]>
    Subject: Re: IIIF and facs (and TEI)
   
    I'm delighted to see the interest from the TEI community in connecting/converting IIIF, TEI and related formats like PAGE, and have been following the discussion with interest.
   
    I'd like to return to a more tactical question about TEI and the IIIF Image API.  While TEI zones correspond well to IIIF regions, neither standard really requires us to use such subsets of a page image, as facs can point to a whole page image, and a IIIF canvas's image resource generally will display the entire page of a manuscript.  I'd like to know more about what I suspect will be the most common case -- associating a page transcript with a page facsimile using facs to point to a IIIF-hosted document.
   
    We can certainly point our facs attributes at a IIIF-compliant URL, but how do we indicate to a IIIF-aware TEI viewer that there is a IIIF image endpoint which can be used for deep zooming by a client like OpenSeadragon?  I gather that the value of a facs attribute can refer to nearly anything, and need not be a URL.  Is there a way to add IIIF-specific data to facs?  Should that be better addressed by another attribute on pb?
   
    I'm imagining that something basic like <pb facs="$ENDPOINT/full/full/0/default.jpg"> (which would work for a viewer unaware of IIIF) could be expanded along the lines of
    <pb facs="$ENDPOINT/full/full/0/default.jpg" iiif="$ENDPOINT"> or perhaps <pb facs="$ENDPOINT/full/full/0/default.jpg; iiif=$ENDPOINT"> but this is really new territory for me, and could use advice on existing practice.
   
    Thanks,
   
    Ben
   
    ------------------------------
   
    Date:    Thu, 29 Jun 2017 02:29:00 +0000
    From:    Martin Mueller <[hidden email]>
    Subject: Re: IIIF and facs (and TEI)
   
    An interesting thread. A good opportunity for someone with larger technical chops than I possess to write a digest along the lines of TEI and IIIF in 2017: the State of the Art. I know next to nothing about the underlying technologies, but I sense from conversations with librarians that things are on the cusp of moving. So quite a few readers of this list might appreciate a digest of this thread.
   
   
   
   
    On 6/28/17, 8:47 PM, "TEI (Text Encoding Initiative) public discussion list on behalf of Ben Brumfield" <[hidden email] on behalf of [hidden email]> wrote:
   
    >I'm delighted to see the interest from the TEI community in connecting/converting IIIF, TEI and related formats like PAGE, and have been following the discussion with interest.
    >
    >I'd like to return to a more tactical question about TEI and the IIIF Image API.  While TEI zones correspond well to IIIF regions, neither standard really requires us to use such subsets of a page image, as facs can point to a whole page image, and a IIIF canvas's image resource generally will display the entire page of a manuscript.  I'd like to know more about what I suspect will be the most common case -- associating a page transcript with a page facsimile using facs to point to a IIIF-hosted document.
    >
    >We can certainly point our facs attributes at a IIIF-compliant URL, but how do we indicate to a IIIF-aware TEI viewer that there is a IIIF image endpoint which can be used for deep zooming by a client like OpenSeadragon?  I gather that the value of a facs attribute can refer to nearly anything, and need not be a URL.  Is there a way to add IIIF-specific data to facs?  Should that be better addressed by another attribute on pb?
    >
    >I'm imagining that something basic like <pb facs="$ENDPOINT/full/full/0/default.jpg"> (which would work for a viewer unaware of IIIF) could be expanded along the lines of
    ><pb facs="$ENDPOINT/full/full/0/default.jpg" iiif="$ENDPOINT"> or perhaps <pb facs="$ENDPOINT/full/full/0/default.jpg; iiif=$ENDPOINT"> but this is really new territory for me, and could use advice on existing practice.
    >
    >Thanks,
    >
    >Ben
   
    ------------------------------
   
    End of TEI-L Digest - 27 Jun 2017 to 28 Jun 2017 (#2017-148)
    ************************************************************
   

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: IIIF and facs (and TEI)

KANZAKI Masahide
In reply to this post by Lou Burnard-6
Hello,

IMHO facs attribute should provide generic image URI that can be
dereferenced by non IIIF aware applications.
"$ENDPOINT/full/full/0/default.jpg" is quite specific pattern of IIIF
image API, so an IIIF aware application could "guess" $ENDPOINT part
and get info.json .

Of course, this pattern is not necessarily be an IIIF URI (though is
likely), and in some cases, image archive may want to provide
different URI pattern e.g. "$ENDPOINT/full/1000,/0/default.jpg".

I think a sort of "well-known" prefix would be an idea. Let's say
"IIIF" is an well-known prefix that indicates expanded URI is IIIF
image API compliant. Then

<prefixDef ident="IIIF" matchPattern="([\w\d-]+)"
    replacementPattern="http://iiif.bodleian.ox.ac.uk/iiif/image/$1/full/full/0/default.jpg">
...

<pb n="152" facs="IIIF:e6ad69d4-9b90-4afc-a32d-d4af0889f1b8"/>

will work for non-IIIF apps, and IIIF apps can confidently find
$ENDPOINT (though I'm afraid it's a bit hard for non TEI aware XML
apps to resolve this prefix).

cheers,

2017-06-29 19:31 GMT+09:00 Lou Burnard <[hidden email]>:

> On 29/06/17 02:47, Ben Brumfield wrote:
>>
>>   I gather that the value of a facs attribute can refer to nearly
>> anything, and need not be a URL.  Is there a way to add IIIF-specific data
>> to facs?  Should that be better addressed by another attribute on pb?
>
>
> The value of a @facs attribute must conform to the W3C URI syntax. So it can
> be any kind of pointer. I am not sure what you mean by "IIIF-specific data"
> but if it can be expressed as a URI (which I would be surprised if not) then
> you're home and dry.
> Of course you're not doing the non-IIIF user any favours by using something
> which is entirely meaningless except to IIIF-aware processors. The TEI
> prefix definition mechanism might perhaps help here: you could do <pb
> facs="IIIF:image.jpg"/> throughout your document and then define a mapping
> between the string "IIF" and a full URL using &lt;prefixDef>
>
>
>> I'm imagining that something basic like <pb
>> facs="$ENDPOINT/full/full/0/default.jpg"> (which would work for a viewer
>> unaware of IIIF) could be expanded along the lines of
>> <pb facs="$ENDPOINT/full/full/0/default.jpg" iiif="$ENDPOINT"> or perhaps
>> <pb facs="$ENDPOINT/full/full/0/default.jpg; iiif=$ENDPOINT"> but this is
>> really new territory for me, and could use advice on existing practice.
>
>
> Not sure I understand your examples here. The second two seem identical.



--
@prefix : <http://www.kanzaki.com/ns/sig#> . <> :from [:name
"KANZAKI Masahide"; :nick "masaka"; :email "[hidden email]"].
12
Loading...