Cocoa texts and conversions?

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Cocoa texts and conversions?

Martin Holmes
Hi all,

The TEI Stylesheets system includes code for converting the old COCOA
text encoding system:

<https://en.wikipedia.org/wiki/COCOA_(digital_humanities)>

into TEI. That code is not actually working; it produces invalid TEI, as
you can see if you run "make test" and then validate cocoatest.xml or
cocoatest2.xml. Obviously that should be fixed, but it occurs to me to
wonder whether there's actually any utility at this point in maintaining
a Cocoa-to-TEI converter at all. The only archive I'm aware of with
texts in Cocoa is the Oxford Text Archive, and as far as I can see from
a quick investigation, all of those files have been converted to TEI
already.

Does anyone know of any significant texts encoded in Cocoa which have
not yet been converted to TEI? If there are none left in the wild, we
could quietly retire this particular conversion format and reduce our
maintenance and testing burden.

Cheers,
Martin
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Lou Burnard-6
It wouldn't surprise me if there were still some COCOA-style marked up
texts out there in the wild, unlikely as it seems. If there are any,
though, they might well find this conversion instructive, largely
because of what it doesn't do.

 From a TEI standpoint, the syntax of COCOA files  suffers three major
drawbacks:

a) you can only mark points in the text with things that look like tags,
not spans
b) if you must mark spans, you do it with arbitrary "special characters"
that act like brackets
c) there's no way of specifying what any of the markup is supposed to
mean, so deciding how to map it to TEI has to be done on a case by case
basis.

Probably the only useful (i.e. generic) COCOA convertor would be one
that turns e.g. <X foo> into <milestone unit="X" value="foo"/> and
%wibble% into <hi rend="percent">wibble</hi> (and even then you'd have
to make assumptions about which characters count as "special").

Oh, and of course in really hardcore COCOA files, you can redefine the
characters to mean absolutely anything, e.g. whether they are
punctation, "special characters", space equivalents, multiple digraphs,
or whatever. That's just how we used to roll pre-Unicode children.


  On 06/01/17 18:26, Martin Holmes wrote:

> Hi all,
>
> The TEI Stylesheets system includes code for converting the old COCOA
> text encoding system:
>
> <https://en.wikipedia.org/wiki/COCOA_(digital_humanities)>
>
> into TEI. That code is not actually working; it produces invalid TEI,
> as you can see if you run "make test" and then validate cocoatest.xml
> or cocoatest2.xml. Obviously that should be fixed, but it occurs to me
> to wonder whether there's actually any utility at this point in
> maintaining a Cocoa-to-TEI converter at all. The only archive I'm
> aware of with texts in Cocoa is the Oxford Text Archive, and as far as
> I can see from a quick investigation, all of those files have been
> converted to TEI already.
>
> Does anyone know of any significant texts encoded in Cocoa which have
> not yet been converted to TEI? If there are none left in the wild, we
> could quietly retire this particular conversion format and reduce our
> maintenance and testing burden.
>
> Cheers,
> Martin
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Martin Holmes
James and Sebastian seem to have written the conversion in 2010; I
wonder if it was really just a tool to convert the OTA texts to TEI,
rather than something more generic.

I found this discussion too, which I'd forgotten about:

<http://tei-l.970651.n3.nabble.com/COCOA-anyone-td4026219.html>

and as Piotr says there, the Wikipedia entry for COCOA advertises our
continuing support for it, so if we do retire this conversion rather
than try to fix it (if that's even possible), we'll have to remember to
update the Wikipedia entry too.

Cheers,
Martin

On 2017-01-06 12:17 PM, Lou Burnard wrote:

> It wouldn't surprise me if there were still some COCOA-style marked up
> texts out there in the wild, unlikely as it seems. If there are any,
> though, they might well find this conversion instructive, largely
> because of what it doesn't do.
>
> From a TEI standpoint, the syntax of COCOA files  suffers three major
> drawbacks:
>
> a) you can only mark points in the text with things that look like tags,
> not spans
> b) if you must mark spans, you do it with arbitrary "special characters"
> that act like brackets
> c) there's no way of specifying what any of the markup is supposed to
> mean, so deciding how to map it to TEI has to be done on a case by case
> basis.
>
> Probably the only useful (i.e. generic) COCOA convertor would be one
> that turns e.g. <X foo> into <milestone unit="X" value="foo"/> and
> %wibble% into <hi rend="percent">wibble</hi> (and even then you'd have
> to make assumptions about which characters count as "special").
>
> Oh, and of course in really hardcore COCOA files, you can redefine the
> characters to mean absolutely anything, e.g. whether they are
> punctation, "special characters", space equivalents, multiple digraphs,
> or whatever. That's just how we used to roll pre-Unicode children.
>
>
>  On 06/01/17 18:26, Martin Holmes wrote:
>> Hi all,
>>
>> The TEI Stylesheets system includes code for converting the old COCOA
>> text encoding system:
>>
>> <https://en.wikipedia.org/wiki/COCOA_(digital_humanities)>
>>
>> into TEI. That code is not actually working; it produces invalid TEI,
>> as you can see if you run "make test" and then validate cocoatest.xml
>> or cocoatest2.xml. Obviously that should be fixed, but it occurs to me
>> to wonder whether there's actually any utility at this point in
>> maintaining a Cocoa-to-TEI converter at all. The only archive I'm
>> aware of with texts in Cocoa is the Oxford Text Archive, and as far as
>> I can see from a quick investigation, all of those files have been
>> converted to TEI already.
>>
>> Does anyone know of any significant texts encoded in Cocoa which have
>> not yet been converted to TEI? If there are none left in the wild, we
>> could quietly retire this particular conversion format and reduce our
>> maintenance and testing burden.
>>
>> Cheers,
>> Martin
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Serge Heiden-2
And before the COCOA investigation thread <http://tei-l.970651.n3.nabble.com/Can-you-identify-this-old-mid-1980s-markup-td4023584.html>.

Best,
Serge

Le 06/01/2017 à 21:24, Martin Holmes a écrit :

> James and Sebastian seem to have written the conversion in 2010; I wonder if it was really just a tool to convert the OTA texts to TEI, rather than something more generic.
>
> I found this discussion too, which I'd forgotten about:
>
> <http://tei-l.970651.n3.nabble.com/COCOA-anyone-td4026219.html>
>
> and as Piotr says there, the Wikipedia entry for COCOA advertises our continuing support for it, so if we do retire this conversion rather than try to fix it (if that's even possible), we'll have to
> remember to update the Wikipedia entry too.
>
> Cheers,
> Martin
>
> On 2017-01-06 12:17 PM, Lou Burnard wrote:
>> It wouldn't surprise me if there were still some COCOA-style marked up
>> texts out there in the wild, unlikely as it seems. If there are any,
>> though, they might well find this conversion instructive, largely
>> because of what it doesn't do.
>>
>> From a TEI standpoint, the syntax of COCOA files  suffers three major
>> drawbacks:
>>
>> a) you can only mark points in the text with things that look like tags,
>> not spans
>> b) if you must mark spans, you do it with arbitrary "special characters"
>> that act like brackets
>> c) there's no way of specifying what any of the markup is supposed to
>> mean, so deciding how to map it to TEI has to be done on a case by case
>> basis.
>>
>> Probably the only useful (i.e. generic) COCOA convertor would be one
>> that turns e.g. <X foo> into <milestone unit="X" value="foo"/> and
>> %wibble% into <hi rend="percent">wibble</hi> (and even then you'd have
>> to make assumptions about which characters count as "special").
>>
>> Oh, and of course in really hardcore COCOA files, you can redefine the
>> characters to mean absolutely anything, e.g. whether they are
>> punctation, "special characters", space equivalents, multiple digraphs,
>> or whatever. That's just how we used to roll pre-Unicode children.
>>
>>
>>  On 06/01/17 18:26, Martin Holmes wrote:
>>> Hi all,
>>>
>>> The TEI Stylesheets system includes code for converting the old COCOA
>>> text encoding system:
>>>
>>> <https://en.wikipedia.org/wiki/COCOA_(digital_humanities)>
>>>
>>> into TEI. That code is not actually working; it produces invalid TEI,
>>> as you can see if you run "make test" and then validate cocoatest.xml
>>> or cocoatest2.xml. Obviously that should be fixed, but it occurs to me
>>> to wonder whether there's actually any utility at this point in
>>> maintaining a Cocoa-to-TEI converter at all. The only archive I'm
>>> aware of with texts in Cocoa is the Oxford Text Archive, and as far as
>>> I can see from a quick investigation, all of those files have been
>>> converted to TEI already.
>>>
>>> Does anyone know of any significant texts encoded in Cocoa which have
>>> not yet been converted to TEI? If there are none left in the wild, we
>>> could quietly retire this particular conversion format and reduce our
>>> maintenance and testing burden.
>>>
>>> Cheers,
>>> Martin
>>
>>

--
Dr. Serge Heiden, [hidden email], http://textometrie.ens-lyon.fr
ENS de Lyon - IHRIM UMR5317
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Lou Burnard-6
In reply to this post by Martin Holmes
I'd forgotten that discussion on TEI-L too, and it's very entertaining!
I also think the claim on Wikipedia that the TEI maintains this
conversion needs a bit of attention. At best we could point to it as an
example of how such a conversion might be done. But you need to
understand both COCOA and XSLT to do it, and the number of such people
available is dwindling daily...

I also just remembered that I once wrote an article documenting how I
converted John Burrow's Austen texts (which were marked up in COCOA) to
TEI once. If I can find it, I'll post it.



On 06/01/17 20:24, Martin Holmes wrote:

> James and Sebastian seem to have written the conversion in 2010; I
> wonder if it was really just a tool to convert the OTA texts to TEI,
> rather than something more generic.
>
> I found this discussion too, which I'd forgotten about:
>
> <http://tei-l.970651.n3.nabble.com/COCOA-anyone-td4026219.html>
>
> and as Piotr says there, the Wikipedia entry for COCOA advertises our
> continuing support for it, so if we do retire this conversion rather
> than try to fix it (if that's even possible), we'll have to remember
> to update the Wikipedia entry too.
>
> Cheers,
> Martin
>
> On 2017-01-06 12:17 PM, Lou Burnard wrote:
>> It wouldn't surprise me if there were still some COCOA-style marked up
>> texts out there in the wild, unlikely as it seems. If there are any,
>> though, they might well find this conversion instructive, largely
>> because of what it doesn't do.
>>
>> From a TEI standpoint, the syntax of COCOA files  suffers three major
>> drawbacks:
>>
>> a) you can only mark points in the text with things that look like tags,
>> not spans
>> b) if you must mark spans, you do it with arbitrary "special characters"
>> that act like brackets
>> c) there's no way of specifying what any of the markup is supposed to
>> mean, so deciding how to map it to TEI has to be done on a case by case
>> basis.
>>
>> Probably the only useful (i.e. generic) COCOA convertor would be one
>> that turns e.g. <X foo> into <milestone unit="X" value="foo"/> and
>> %wibble% into <hi rend="percent">wibble</hi> (and even then you'd have
>> to make assumptions about which characters count as "special").
>>
>> Oh, and of course in really hardcore COCOA files, you can redefine the
>> characters to mean absolutely anything, e.g. whether they are
>> punctation, "special characters", space equivalents, multiple digraphs,
>> or whatever. That's just how we used to roll pre-Unicode children.
>>
>>
>>  On 06/01/17 18:26, Martin Holmes wrote:
>>> Hi all,
>>>
>>> The TEI Stylesheets system includes code for converting the old COCOA
>>> text encoding system:
>>>
>>> <https://en.wikipedia.org/wiki/COCOA_(digital_humanities)>
>>>
>>> into TEI. That code is not actually working; it produces invalid TEI,
>>> as you can see if you run "make test" and then validate cocoatest.xml
>>> or cocoatest2.xml. Obviously that should be fixed, but it occurs to me
>>> to wonder whether there's actually any utility at this point in
>>> maintaining a Cocoa-to-TEI converter at all. The only archive I'm
>>> aware of with texts in Cocoa is the Oxford Text Archive, and as far as
>>> I can see from a quick investigation, all of those files have been
>>> converted to TEI already.
>>>
>>> Does anyone know of any significant texts encoded in Cocoa which have
>>> not yet been converted to TEI? If there are none left in the wild, we
>>> could quietly retire this particular conversion format and reduce our
>>> maintenance and testing burden.
>>>
>>> Cheers,
>>> Martin
>>
>>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Robinson, Peter
While on this subject — dear old Collate originally used a modified COCOA markup (modified, in that it introduced explicit end tags alongside those implied by COCOA). So there were many many thousands of transcription files prepared in Collate (eg. around 10000 for the Canterbury Tales, around half that for the Dante commedia, 1800 for one chapter of the gospel of John, bilah bilah). Lots of those have been long ago converted to TEI so that horse is now flown (such an awful mixed metaphor that I have to let it stand.) There might be a few unconverted files out there but they are probably unconverted because their creators ceased to love them enough (or went and got a real job, or something.) I remember the conversion from Collate being quite straightforward, as Collate defined categories of elements as either milestone or content element.
Peter

> On Jan 6, 2017, at 9:39 PM, Lou Burnard <[hidden email]> wrote:
>
> I'd forgotten that discussion on TEI-L too, and it's very entertaining! I also think the claim on Wikipedia that the TEI maintains this conversion needs a bit of attention. At best we could point to it as an example of how such a conversion might be done. But you need to understand both COCOA and XSLT to do it, and the number of such people available is dwindling daily...
>
> I also just remembered that I once wrote an article documenting how I converted John Burrow's Austen texts (which were marked up in COCOA) to TEI once. If I can find it, I'll post it.
>
>
>
> On 06/01/17 20:24, Martin Holmes wrote:
>> James and Sebastian seem to have written the conversion in 2010; I wonder if it was really just a tool to convert the OTA texts to TEI, rather than something more generic.
>>
>> I found this discussion too, which I'd forgotten about:
>>
>> <http://tei-l.970651.n3.nabble.com/COCOA-anyone-td4026219.html>
>>
>> and as Piotr says there, the Wikipedia entry for COCOA advertises our continuing support for it, so if we do retire this conversion rather than try to fix it (if that's even possible), we'll have to remember to update the Wikipedia entry too.
>>
>> Cheers,
>> Martin
>>
>> On 2017-01-06 12:17 PM, Lou Burnard wrote:
>>> It wouldn't surprise me if there were still some COCOA-style marked up
>>> texts out there in the wild, unlikely as it seems. If there are any,
>>> though, they might well find this conversion instructive, largely
>>> because of what it doesn't do.
>>>
>>> From a TEI standpoint, the syntax of COCOA files  suffers three major
>>> drawbacks:
>>>
>>> a) you can only mark points in the text with things that look like tags,
>>> not spans
>>> b) if you must mark spans, you do it with arbitrary "special characters"
>>> that act like brackets
>>> c) there's no way of specifying what any of the markup is supposed to
>>> mean, so deciding how to map it to TEI has to be done on a case by case
>>> basis.
>>>
>>> Probably the only useful (i.e. generic) COCOA convertor would be one
>>> that turns e.g. <X foo> into <milestone unit="X" value="foo"/> and
>>> %wibble% into <hi rend="percent">wibble</hi> (and even then you'd have
>>> to make assumptions about which characters count as "special").
>>>
>>> Oh, and of course in really hardcore COCOA files, you can redefine the
>>> characters to mean absolutely anything, e.g. whether they are
>>> punctation, "special characters", space equivalents, multiple digraphs,
>>> or whatever. That's just how we used to roll pre-Unicode children.
>>>
>>>
>>> On 06/01/17 18:26, Martin Holmes wrote:
>>>> Hi all,
>>>>
>>>> The TEI Stylesheets system includes code for converting the old COCOA
>>>> text encoding system:
>>>>
>>>> <https://en.wikipedia.org/wiki/COCOA_(digital_humanities)>
>>>>
>>>> into TEI. That code is not actually working; it produces invalid TEI,
>>>> as you can see if you run "make test" and then validate cocoatest.xml
>>>> or cocoatest2.xml. Obviously that should be fixed, but it occurs to me
>>>> to wonder whether there's actually any utility at this point in
>>>> maintaining a Cocoa-to-TEI converter at all. The only archive I'm
>>>> aware of with texts in Cocoa is the Oxford Text Archive, and as far as
>>>> I can see from a quick investigation, all of those files have been
>>>> converted to TEI already.
>>>>
>>>> Does anyone know of any significant texts encoded in Cocoa which have
>>>> not yet been converted to TEI? If there are none left in the wild, we
>>>> could quietly retire this particular conversion format and reduce our
>>>> maintenance and testing burden.
>>>>
>>>> Cheers,
>>>> Martin
>>>
>>>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Elisabeth Burr-2
In reply to this post by Lou Burnard-6
Yes there are. There is for example my corpus of Romance Newspapers,
parts of which can even be consulted online - thanks to the old TactWeb,
which in my eyes should have been developed further, because the ideas
behind it were great. See:
http://home.uni-leipzig.de/burr/CorpusLing/Korpusanalyse/default.htm.
Just click on one of the links.

I know that there are problems. One edition of Le Monde has been loaded
up in the wrong format (DOS-Text or Plain-Text, I can't remember). I
never found the time to do something about it, but the material is there
and should be converted. The Oxford Text Archive has converted part of
my Italian Newspaper Language corpus I have contributed to the Archive,
but as far as I remember it has been done by a student and most probably
needs to be controlled.

To keep things short, conversion from COCOA to TEI is not obsolete yet.

Best Elisabeth who wishes the TEI people the very best for the new year


Am 06.01.2017 um 21:17 schrieb Lou Burnard:

> It wouldn't surprise me if there were still some COCOA-style marked up
> texts out there in the wild, unlikely as it seems. If there are any,
> though, they might well find this conversion instructive, largely
> because of what it doesn't do.
>
> From a TEI standpoint, the syntax of COCOA files  suffers three major
> drawbacks:
>
> a) you can only mark points in the text with things that look like
> tags, not spans
> b) if you must mark spans, you do it with arbitrary "special
> characters" that act like brackets
> c) there's no way of specifying what any of the markup is supposed to
> mean, so deciding how to map it to TEI has to be done on a case by
> case basis.
>
> Probably the only useful (i.e. generic) COCOA convertor would be one
> that turns e.g. <X foo> into <milestone unit="X" value="foo"/> and
> %wibble% into <hi rend="percent">wibble</hi> (and even then you'd have
> to make assumptions about which characters count as "special").
>
> Oh, and of course in really hardcore COCOA files, you can redefine the
> characters to mean absolutely anything, e.g. whether they are
> punctation, "special characters", space equivalents, multiple
> digraphs, or whatever. That's just how we used to roll pre-Unicode
> children.
>
>
>  On 06/01/17 18:26, Martin Holmes wrote:
>> Hi all,
>>
>> The TEI Stylesheets system includes code for converting the old COCOA
>> text encoding system:
>>
>> <https://en.wikipedia.org/wiki/COCOA_(digital_humanities)>
>>
>> into TEI. That code is not actually working; it produces invalid TEI,
>> as you can see if you run "make test" and then validate cocoatest.xml
>> or cocoatest2.xml. Obviously that should be fixed, but it occurs to
>> me to wonder whether there's actually any utility at this point in
>> maintaining a Cocoa-to-TEI converter at all. The only archive I'm
>> aware of with texts in Cocoa is the Oxford Text Archive, and as far
>> as I can see from a quick investigation, all of those files have been
>> converted to TEI already.
>>
>> Does anyone know of any significant texts encoded in Cocoa which have
>> not yet been converted to TEI? If there are none left in the wild, we
>> could quietly retire this particular conversion format and reduce our
>> maintenance and testing burden.
>>
>> Cheers,
>> Martin
>
>

--

Prof. Dr. Elisabeth Burr
Lehrstuhl Französische / frankophone und italienische Sprachwissenschaft
Geschäftsführende Direktorin des Instituts für Romanistik
Präsidentin der European Association for Digital Humanities (EADH)
Universität Leipzig
Beethovenstr. 15
D-04107 Leipzig
http://home.uni-leipzig.de/burr/
http://www.dhd2016.de/
http://www.culingtec.uni-leipzig.de/ESU_C_T/
http://www.culingtec.uni-leipzig.de/quebec/
http://www.uni-leipzig.de/gal2010
http://www.uni-leipzig.de/~burr/JISU
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Stuart A. Yeates
In reply to this post by Lou Burnard-6


On Sat, Jan 7, 2017 at 9:39 AM, Lou Burnard <[hidden email]> wrote:
I also think the claim on Wikipedia that the TEI maintains this conversion needs a bit of attention. 

OK, that's probably my fault. If someone who understands the situation better than I has ten minutes to hit edit on-wiki and explain, that would be great.

cheers
stuart 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Elisa Beshero-Bondar-2
This is an interesting thread for those of us, like me anyhow, who increasingly find ourselves in a position to repurpose old 1990s or even 1980s code and begin to realize that once our markup universe was a lot more diverse, and also that those old projects will require some serious TLC to "repurpose" now. I wonder if really COCOA and friends require pipeline stages of conversion, something like what Wendell Piez does to map LMNL to TEI? It also sounds like every conversion is a custom-fit process, and I think we ought to develop more guiding explanation on how to assess, plan, and map out conversion processes like these. In other words, more documentation, since we can't provide a magic-bullet stylesheet. 

I'm supposed to be working on a project kind of related to this on "up-conversion" process, and part of the title of "Starting from Mess". It only looks like "mess" now when conversion isn't a simple one-step process, and when we realize how much serious work may be involved in curating a project only a few decades old.

Elisa

--
Elisa Beshero-Bondar, PhD 
Director, Center for the Digital Text
Associate Professor of English 
University of Pittsburgh at Greensburg
150 Finoli Drive, Greensburg, PA 15601 USA
E-mail: [hidden email] | Development site: http://newtfire.org

Typeset by hand on my iPad

On Jan 7, 2017, at 4:04 AM, Stuart A. Yeates <[hidden email]> wrote:



On Sat, Jan 7, 2017 at 9:39 AM, Lou Burnard <[hidden email]> wrote:
I also think the claim on Wikipedia that the TEI maintains this conversion needs a bit of attention. 

OK, that's probably my fault. If someone who understands the situation better than I has ten minutes to hit edit on-wiki and explain, that would be great.

cheers
stuart 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Martin Holmes
In reply to this post by Stuart A. Yeates
On 2017-01-07 01:04 AM, Stuart A. Yeates wrote:

>
>
> On Sat, Jan 7, 2017 at 9:39 AM, Lou Burnard
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     I also think the claim on Wikipedia that the TEI maintains this
>     conversion needs a bit of attention.
>
>
> OK, that's probably my fault. If someone who understands the situation
> better than I has ten minutes to hit edit on-wiki and explain, that
> would be great.

What I'm actually trying to do is understand the situation better, and
I'm getting there. This is what I think we've learned:

  - Cocoa is not very amenable to a generic conversion process, since
its syntax and usage is extremely variable;

  - Our current Cocoa conversion script (James will hopefully confirm or
refute this) was written specifically to convert certain OTA texts, and
could not be applied generically;

  - In any case, that script is broken (it produces invalid TEI);

  - There is a possible way forward in writing a conversion that creates
milestone elements, as suggested by Lou, but that would still depend on
knowledge of the actual syntax and special characters used in any
specific set of texts;

  - There are still some Cocoa texts out there which have not yet been
converted to TEI.

What I take from this is that it would probably make more sense to work
with those people such as Elizabeth who have Cocoa texts that matter,
and get those converted to TEI; meanwhile we could retire the conversion
that doesn't work, and remove it from the Stylesheets repo. This is
obviously a question for Council, though.

Cheers,
Martin

>
> cheers
> stuart
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Elisa Beshero-Bondar-2
Martin-- Agreed! It's a good topic for the February face-to-face meeting.

--Elisa

Sent from my iPhone

> On Jan 7, 2017, at 1:40 PM, Martin Holmes <[hidden email]> wrote:
>
>> On 2017-01-07 01:04 AM, Stuart A. Yeates wrote:
>>
>>
>> On Sat, Jan 7, 2017 at 9:39 AM, Lou Burnard
>> <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>    I also think the claim on Wikipedia that the TEI maintains this
>>    conversion needs a bit of attention.
>>
>>
>> OK, that's probably my fault. If someone who understands the situation
>> better than I has ten minutes to hit edit on-wiki and explain, that
>> would be great.
>
> What I'm actually trying to do is understand the situation better, and I'm getting there. This is what I think we've learned:
>
> - Cocoa is not very amenable to a generic conversion process, since its syntax and usage is extremely variable;
>
> - Our current Cocoa conversion script (James will hopefully confirm or refute this) was written specifically to convert certain OTA texts, and could not be applied generically;
>
> - In any case, that script is broken (it produces invalid TEI);
>
> - There is a possible way forward in writing a conversion that creates milestone elements, as suggested by Lou, but that would still depend on knowledge of the actual syntax and special characters used in any specific set of texts;
>
> - There are still some Cocoa texts out there which have not yet been converted to TEI.
>
> What I take from this is that it would probably make more sense to work with those people such as Elizabeth who have Cocoa texts that matter, and get those converted to TEI; meanwhile we could retire the conversion that doesn't work, and remove it from the Stylesheets repo. This is obviously a question for Council, though.
>
> Cheers,
> Martin
>
>>
>> cheers
>> stuart
>>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Martin Holmes
In reply to this post by Elisabeth Burr-2
Hi Elizabeth,

It's fascinating to see TactWeb still running and working well. I've
never actually used it myself, but it still has a home page:

<http://tactweb.cch.kcl.ac.uk/doc/tact.htm>

which suggests that it uses its own Tact Database format (TDB) which you
create with DOS TACT: "You will need basic DOS TACT (version 2.x) to
create the databases that TACTweb can serve. All TACT software is
available for free for academic purposes, and can be downloaded from
ftp://chass.utoronto.ca/pub/cch/tact/tact2.1/. A special TACT component
that is still being developed by the author is sgml2tdb -- software that
can take a text marked up in an SGML markup scheme, and create a TACT
database. This special TACT component is available from
ftp://ic-unix.ic.utoronto.ca/pub/sgml2tdb."
<http://tactweb.cch.kcl.ac.uk/doc/home.htm>

The download is no longer there, though.

This page suggests that you use the TACT MAKBAS function to create the
TDB file:

<http://users.ox.ac.uk/~ctitext2/enquiry/tat06.html>

and you set specific configuration settings in MAKBAS to tell it whether
you are using markup (i.e. some flavour of Cocoa), and if so, what your
opening and closing brackets are.

For those documents you have which are still in Cocoa, can they be made
public? If so, I bet if you share them online (through e.g. GitHub),
someone with a couple of free afternoons will probably help with a
conversion to TEI, especially if the Cocoa markup is simple or
well-documented.

Cheers,
Martin

On 2017-01-06 04:28 PM, Elisabeth Burr wrote:

> Yes there are. There is for example my corpus of Romance Newspapers,
> parts of which can even be consulted online - thanks to the old TactWeb,
> which in my eyes should have been developed further, because the ideas
> behind it were great. See:
> http://home.uni-leipzig.de/burr/CorpusLing/Korpusanalyse/default.htm.
> Just click on one of the links.
>
> I know that there are problems. One edition of Le Monde has been loaded
> up in the wrong format (DOS-Text or Plain-Text, I can't remember). I
> never found the time to do something about it, but the material is there
> and should be converted. The Oxford Text Archive has converted part of
> my Italian Newspaper Language corpus I have contributed to the Archive,
> but as far as I remember it has been done by a student and most probably
> needs to be controlled.
>
> To keep things short, conversion from COCOA to TEI is not obsolete yet.
>
> Best Elisabeth who wishes the TEI people the very best for the new year
>
>
> Am 06.01.2017 um 21:17 schrieb Lou Burnard:
>> It wouldn't surprise me if there were still some COCOA-style marked up
>> texts out there in the wild, unlikely as it seems. If there are any,
>> though, they might well find this conversion instructive, largely
>> because of what it doesn't do.
>>
>> From a TEI standpoint, the syntax of COCOA files  suffers three major
>> drawbacks:
>>
>> a) you can only mark points in the text with things that look like
>> tags, not spans
>> b) if you must mark spans, you do it with arbitrary "special
>> characters" that act like brackets
>> c) there's no way of specifying what any of the markup is supposed to
>> mean, so deciding how to map it to TEI has to be done on a case by
>> case basis.
>>
>> Probably the only useful (i.e. generic) COCOA convertor would be one
>> that turns e.g. <X foo> into <milestone unit="X" value="foo"/> and
>> %wibble% into <hi rend="percent">wibble</hi> (and even then you'd have
>> to make assumptions about which characters count as "special").
>>
>> Oh, and of course in really hardcore COCOA files, you can redefine the
>> characters to mean absolutely anything, e.g. whether they are
>> punctation, "special characters", space equivalents, multiple
>> digraphs, or whatever. That's just how we used to roll pre-Unicode
>> children.
>>
>>
>>  On 06/01/17 18:26, Martin Holmes wrote:
>>> Hi all,
>>>
>>> The TEI Stylesheets system includes code for converting the old COCOA
>>> text encoding system:
>>>
>>> <https://en.wikipedia.org/wiki/COCOA_(digital_humanities)>
>>>
>>> into TEI. That code is not actually working; it produces invalid TEI,
>>> as you can see if you run "make test" and then validate cocoatest.xml
>>> or cocoatest2.xml. Obviously that should be fixed, but it occurs to
>>> me to wonder whether there's actually any utility at this point in
>>> maintaining a Cocoa-to-TEI converter at all. The only archive I'm
>>> aware of with texts in Cocoa is the Oxford Text Archive, and as far
>>> as I can see from a quick investigation, all of those files have been
>>> converted to TEI already.
>>>
>>> Does anyone know of any significant texts encoded in Cocoa which have
>>> not yet been converted to TEI? If there are none left in the wild, we
>>> could quietly retire this particular conversion format and reduce our
>>> maintenance and testing burden.
>>>
>>> Cheers,
>>> Martin
>>
>>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

James Cummings-4
In reply to this post by Martin Holmes
Hi Martin,

Apologies for the delay in answering.  In case anyone is
interested in their history: Yes, that initial cocoa-to-xsl
script was indeed written by me (and then improved substantially
into doing multiple passes by Sebastian with additional debugging
output).  My first attempts at doing so were to make the Cocoa
file well-formed XML (through wrapping a root element around it,
changing any individual tags to be milestone-like psuedo
elements) which was done in Perl before using XSLT, and then
running that through two stylesheets, one to transform the flat
cocoa markup language to a flattened TEI markup language (i.e.
rename things) and then the next to fill that flattened TEI with
structure (using xsl:for-each-group).  The original files were
written in 2004, with the express purpose of being used for a
subset of the OTA consisting specifically of verse drama and I
even spoke about this at ALLC-ACH 2004 in Gothenburg. The are
available for now at
http://users.ox.ac.uk/~jamesc/research/cocoa2tei/ where I put
them in early 2006. I was doing this in a very modular way with
the intent of applying to different sorts of texts as well.  
Partly this was created to estimate for a (failed) funding bid by
the OTA how much work it would take to convert a large portion of
the OTA archives all to TEI (whilst maintaining the original
files of course).  [If I remember the reviews correctly it failed
partly because one reviewer was outraged that mere technicians
might *change* the files lovingly crafted by academics, even
though the bid clearly spelled out that this was only format
conversion and that the original files would also remain
available.  Ah well.] In 2010 as part of a business as usual
drive to migrate some of the OTA texts into TEI P5, my email
archive shows that Sebastian started working on a new
cocoa-to-tei conversion and took my XSLT as the starting point
and added the regex and an extra passes to do it all in a single
XSLT while also making it a bit more general.  I contributed some
bits and pieces.  That is probably why it isn't as generalised as
it could be.  But I think it serves better as an example than a
generalised conversion.

Although the OTA is now in the Bodleian Libraries, not here at IT
Services, I doubt it will ever get rid of its Cocoa texts (even
if it provides migrated versions of them). While it may be easier
to just get rid of the conversion I would certainly be willing to
give it a go in trying to fix it. (i.e. If you've made a github
issue, feel free to assign to me.) As to why we should have and
maintain legacy conversions like this, I would say that it is as
an example for those approaching similar migration
up-conversions.  I'm not saying that is a very good reason, but
since it used to work I'm suspecting that making its output valid
again wouldn't necessarily be too difficult.

-James

On 07/01/17 18:40, Martin Holmes wrote:

> What I'm actually trying to do is understand the situation
> better, and I'm getting there. This is what I think we've learned:
>
>  - Cocoa is not very amenable to a generic conversion process,
> since its syntax and usage is extremely variable;
>
>  - Our current Cocoa conversion script (James will hopefully
> confirm or refute this) was written specifically to convert
> certain OTA texts, and could not be applied generically;
>
>  - In any case, that script is broken (it produces invalid TEI);
>
>  - There is a possible way forward in writing a conversion that
> creates milestone elements, as suggested by Lou, but that would
> still depend on knowledge of the actual syntax and special
> characters used in any specific set of texts;
>
>  - There are still some Cocoa texts out there which have not
> yet been converted to TEI.
>
> What I take from this is that it would probably make more sense
> to work with those people such as Elizabeth who have Cocoa
> texts that matter, and get those converted to TEI; meanwhile we
> could retire the conversion that doesn't work, and remove it
> from the Stylesheets repo. This is obviously a question for
> Council, though.
>
> Cheers,
> Martin
>
>>
>> cheers
>> stuart
>>


--
Dr James Cummings, Academic IT Services, University of Oxford,
TEI Consultations: [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Martin Holmes
Hi James,

Thanks for the clarification. It sounds like we should continue to
support Cocoa as well as we can. I've made an issue and assigned it to you:

<https://github.com/TEIC/Stylesheets/issues/222>

All the best,
Martin

On 2017-01-09 03:57 AM, James Cummings wrote:

>
> Hi Martin,
>
> Apologies for the delay in answering.  In case anyone is interested in
> their history: Yes, that initial cocoa-to-xsl script was indeed written
> by me (and then improved substantially into doing multiple passes by
> Sebastian with additional debugging output).  My first attempts at doing
> so were to make the Cocoa file well-formed XML (through wrapping a root
> element around it, changing any individual tags to be milestone-like
> psuedo elements) which was done in Perl before using XSLT, and then
> running that through two stylesheets, one to transform the flat cocoa
> markup language to a flattened TEI markup language (i.e. rename things)
> and then the next to fill that flattened TEI with structure (using
> xsl:for-each-group).  The original files were written in 2004, with the
> express purpose of being used for a subset of the OTA consisting
> specifically of verse drama and I even spoke about this at ALLC-ACH 2004
> in Gothenburg. The are available for now at
> http://users.ox.ac.uk/~jamesc/research/cocoa2tei/ where I put them in
> early 2006. I was doing this in a very modular way with the intent of
> applying to different sorts of texts as well.  Partly this was created
> to estimate for a (failed) funding bid by the OTA how much work it would
> take to convert a large portion of the OTA archives all to TEI (whilst
> maintaining the original files of course).  [If I remember the reviews
> correctly it failed partly because one reviewer was outraged that mere
> technicians might *change* the files lovingly crafted by academics, even
> though the bid clearly spelled out that this was only format conversion
> and that the original files would also remain available.  Ah well.] In
> 2010 as part of a business as usual drive to migrate some of the OTA
> texts into TEI P5, my email archive shows that Sebastian started working
> on a new cocoa-to-tei conversion and took my XSLT as the starting point
> and added the regex and an extra passes to do it all in a single XSLT
> while also making it a bit more general.  I contributed some bits and
> pieces.  That is probably why it isn't as generalised as it could be.
> But I think it serves better as an example than a generalised conversion.
>
> Although the OTA is now in the Bodleian Libraries, not here at IT
> Services, I doubt it will ever get rid of its Cocoa texts (even if it
> provides migrated versions of them). While it may be easier to just get
> rid of the conversion I would certainly be willing to give it a go in
> trying to fix it. (i.e. If you've made a github issue, feel free to
> assign to me.) As to why we should have and maintain legacy conversions
> like this, I would say that it is as an example for those approaching
> similar migration up-conversions.  I'm not saying that is a very good
> reason, but since it used to work I'm suspecting that making its output
> valid again wouldn't necessarily be too difficult.
>
> -James
>
> On 07/01/17 18:40, Martin Holmes wrote:
>> What I'm actually trying to do is understand the situation better, and
>> I'm getting there. This is what I think we've learned:
>>
>>  - Cocoa is not very amenable to a generic conversion process, since
>> its syntax and usage is extremely variable;
>>
>>  - Our current Cocoa conversion script (James will hopefully confirm
>> or refute this) was written specifically to convert certain OTA texts,
>> and could not be applied generically;
>>
>>  - In any case, that script is broken (it produces invalid TEI);
>>
>>  - There is a possible way forward in writing a conversion that
>> creates milestone elements, as suggested by Lou, but that would still
>> depend on knowledge of the actual syntax and special characters used
>> in any specific set of texts;
>>
>>  - There are still some Cocoa texts out there which have not yet been
>> converted to TEI.
>>
>> What I take from this is that it would probably make more sense to
>> work with those people such as Elizabeth who have Cocoa texts that
>> matter, and get those converted to TEI; meanwhile we could retire the
>> conversion that doesn't work, and remove it from the Stylesheets repo.
>> This is obviously a question for Council, though.
>>
>> Cheers,
>> Martin
>>
>>>
>>> cheers
>>> stuart
>>>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Martin Wynne-4
In reply to this post by Martin Holmes
Thanks to James for a thorough explanation of the history of COCOA text
conversions in the OTA and for pointing to the scripts.

There is now a list online of the items in the Oxford Text Archive which
indicate in their catalogue records that they contain COCOA markup:

  http://ota.ox.ac.uk/cats/OTA_COCOA.html

As James suggests, we haven't removed or hidden the old COCOA texts,
even when they have been superseded by SGML and XML versions. They will
remain available online, even if they are mainly of interest to
historians of text encoding.

Best wishes,
Martin Wynne

--
Oxford Text Archive,
Bodleian Libraries,
University of Oxford
Tel: +44 1865 283813
[hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Piotr Banski
In reply to this post by James Cummings-4
Hi James,

I think the only good (or at least acceptable) argument is in your last
line (plus your request to get this assigned):

 > since it used to work I'm suspecting that making its output valid
again wouldn't necessarily be too difficult

-- otherwise, I can't see why the Council should devote any time to
up-conversion from COCOA or any legacy format. This should always, IMO,
be an issue for a local project (formal or informal: funded or performed
as e.g. a semester assignment or on a friend-to-friend basis). I feel
scared by the prospect that Council time (including F2F time) might get
allocated to such matters, seeing the flood of _current_ issues at GitHub.

Best regards,

   Piotr




On 01/09/2017 12:57 PM, James Cummings wrote:

> Hi Martin,
>
> Apologies for the delay in answering.  In case anyone is interested in
> their history: Yes, that initial cocoa-to-xsl script was indeed
> written by me (and then improved substantially into doing multiple
> passes by Sebastian with additional debugging output).  My first
> attempts at doing so were to make the Cocoa file well-formed XML
> (through wrapping a root element around it, changing any individual
> tags to be milestone-like psuedo elements) which was done in Perl
> before using XSLT, and then running that through two stylesheets, one
> to transform the flat cocoa markup language to a flattened TEI markup
> language (i.e. rename things) and then the next to fill that flattened
> TEI with structure (using xsl:for-each-group).  The original files
> were written in 2004, with the express purpose of being used for a
> subset of the OTA consisting specifically of verse drama and I even
> spoke about this at ALLC-ACH 2004 in Gothenburg. The are available for
> now at http://users.ox.ac.uk/~jamesc/research/cocoa2tei/ where I put
> them in early 2006. I was doing this in a very modular way with the
> intent of applying to different sorts of texts as well.  Partly this
> was created to estimate for a (failed) funding bid by the OTA how much
> work it would take to convert a large portion of the OTA archives all
> to TEI (whilst maintaining the original files of course).  [If I
> remember the reviews correctly it failed partly because one reviewer
> was outraged that mere technicians might *change* the files lovingly
> crafted by academics, even though the bid clearly spelled out that
> this was only format conversion and that the original files would also
> remain available.  Ah well.] In 2010 as part of a business as usual
> drive to migrate some of the OTA texts into TEI P5, my email archive
> shows that Sebastian started working on a new cocoa-to-tei conversion
> and took my XSLT as the starting point and added the regex and an
> extra passes to do it all in a single XSLT while also making it a bit
> more general.  I contributed some bits and pieces.  That is probably
> why it isn't as generalised as it could be.  But I think it serves
> better as an example than a generalised conversion.
>
> Although the OTA is now in the Bodleian Libraries, not here at IT
> Services, I doubt it will ever get rid of its Cocoa texts (even if it
> provides migrated versions of them). While it may be easier to just
> get rid of the conversion I would certainly be willing to give it a go
> in trying to fix it. (i.e. If you've made a github issue, feel free to
> assign to me.) As to why we should have and maintain legacy
> conversions like this, I would say that it is as an example for those
> approaching similar migration up-conversions.  I'm not saying that is
> a very good reason, but since it used to work I'm suspecting that
> making its output valid again wouldn't necessarily be too difficult.
>
> -James
>
> On 07/01/17 18:40, Martin Holmes wrote:
>> What I'm actually trying to do is understand the situation better,
>> and I'm getting there. This is what I think we've learned:
>>
>>  - Cocoa is not very amenable to a generic conversion process, since
>> its syntax and usage is extremely variable;
>>
>>  - Our current Cocoa conversion script (James will hopefully confirm
>> or refute this) was written specifically to convert certain OTA
>> texts, and could not be applied generically;
>>
>>  - In any case, that script is broken (it produces invalid TEI);
>>
>>  - There is a possible way forward in writing a conversion that
>> creates milestone elements, as suggested by Lou, but that would still
>> depend on knowledge of the actual syntax and special characters used
>> in any specific set of texts;
>>
>>  - There are still some Cocoa texts out there which have not yet been
>> converted to TEI.
>>
>> What I take from this is that it would probably make more sense to
>> work with those people such as Elizabeth who have Cocoa texts that
>> matter, and get those converted to TEI; meanwhile we could retire the
>> conversion that doesn't work, and remove it from the Stylesheets
>> repo. This is obviously a question for Council, though.
>>
>> Cheers,
>> Martin
>>
>>>
>>> cheers
>>> stuart
>>>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

James Cummings-4
Hi Piotr,

That is partly why I suggested it should be assigned to me (since
I know the original code).  I agree that Council has *more* than
enough work keeping up with maintenance of the Guidelines, as
well as the Stylesheets and other software. This would be very
low priority and if I didn't have time to do it in the next
couple releases I'd note as much on the issue, and either I or
someone could remove the cocoa-to-xsl scripts.  Though I'm
curious what should be done when we do so.. previously we've just
moved things around as and when we please or got rid of
stylesheets , and that is good, but it might be beneficial to
leave a note as to where these scripts could be found. (i.e. I
might fork just the cocoa-to-xsl and stick it in my repositories
in case it is useful to anyone, including me, in the future.) But
then do we have to maintain that pointer? Maybe best just to
delete and leave the existence of the closed issue on github as
documentation. ;-)

But honest, I'll give it a good 20 minutes of thought in trying
to fix it --  if I can't in that time, then maybe we'll get rid
of it.

-James


On 10/01/17 12:26, Piotr Banski wrote:

> Hi James,
>
> I think the only good (or at least acceptable) argument is in
> your last line (plus your request to get this assigned):
>
> > since it used to work I'm suspecting that making its output
> valid again wouldn't necessarily be too difficult
>
> -- otherwise, I can't see why the Council should devote any
> time to up-conversion from COCOA or any legacy format. This
> should always, IMO, be an issue for a local project (formal or
> informal: funded or performed as e.g. a semester assignment or
> on a friend-to-friend basis). I feel scared by the prospect
> that Council time (including F2F time) might get allocated to
> such matters, seeing the flood of _current_ issues at GitHub.
>
> Best regards,
>
>   Piotr
>
>
>
>
> On 01/09/2017 12:57 PM, James Cummings wrote:
>> Hi Martin,
>>
>> Apologies for the delay in answering.  In case anyone is
>> interested in their history: Yes, that initial cocoa-to-xsl
>> script was indeed written by me (and then improved
>> substantially into doing multiple passes by Sebastian with
>> additional debugging output).  My first attempts at doing so
>> were to make the Cocoa file well-formed XML (through wrapping
>> a root element around it, changing any individual tags to be
>> milestone-like psuedo elements) which was done in Perl before
>> using XSLT, and then running that through two stylesheets, one
>> to transform the flat cocoa markup language to a flattened TEI
>> markup language (i.e. rename things) and then the next to fill
>> that flattened TEI with structure (using xsl:for-each-group).  
>> The original files were written in 2004, with the express
>> purpose of being used for a subset of the OTA consisting
>> specifically of verse drama and I even spoke about this at
>> ALLC-ACH 2004 in Gothenburg. The are available for now at
>> http://users.ox.ac.uk/~jamesc/research/cocoa2tei/ where I put
>> them in early 2006. I was doing this in a very modular way
>> with the intent of applying to different sorts of texts as
>> well. Partly this was created to estimate for a (failed)
>> funding bid by the OTA how much work it would take to convert
>> a large portion of the OTA archives all to TEI (whilst
>> maintaining the original files of course).  [If I remember the
>> reviews correctly it failed partly because one reviewer was
>> outraged that mere technicians might *change* the files
>> lovingly crafted by academics, even though the bid clearly
>> spelled out that this was only format conversion and that the
>> original files would also remain available.  Ah well.] In 2010
>> as part of a business as usual drive to migrate some of the
>> OTA texts into TEI P5, my email archive shows that Sebastian
>> started working on a new cocoa-to-tei conversion and took my
>> XSLT as the starting point and added the regex and an extra
>> passes to do it all in a single XSLT while also making it a
>> bit more general.  I contributed some bits and pieces.  That
>> is probably why it isn't as generalised as it could be.  But I
>> think it serves better as an example than a generalised
>> conversion.
>>
>> Although the OTA is now in the Bodleian Libraries, not here at
>> IT Services, I doubt it will ever get rid of its Cocoa texts
>> (even if it provides migrated versions of them). While it may
>> be easier to just get rid of the conversion I would certainly
>> be willing to give it a go in trying to fix it. (i.e. If
>> you've made a github issue, feel free to assign to me.) As to
>> why we should have and maintain legacy conversions like this,
>> I would say that it is as an example for those approaching
>> similar migration up-conversions.  I'm not saying that is a
>> very good reason, but since it used to work I'm suspecting
>> that making its output valid again wouldn't necessarily be too
>> difficult.
>>
>> -James
>>
>> On 07/01/17 18:40, Martin Holmes wrote:
>>> What I'm actually trying to do is understand the situation
>>> better, and I'm getting there. This is what I think we've
>>> learned:
>>>
>>>  - Cocoa is not very amenable to a generic conversion
>>> process, since its syntax and usage is extremely variable;
>>>
>>>  - Our current Cocoa conversion script (James will hopefully
>>> confirm or refute this) was written specifically to convert
>>> certain OTA texts, and could not be applied generically;
>>>
>>>  - In any case, that script is broken (it produces invalid TEI);
>>>
>>>  - There is a possible way forward in writing a conversion
>>> that creates milestone elements, as suggested by Lou, but
>>> that would still depend on knowledge of the actual syntax and
>>> special characters used in any specific set of texts;
>>>
>>>  - There are still some Cocoa texts out there which have not
>>> yet been converted to TEI.
>>>
>>> What I take from this is that it would probably make more
>>> sense to work with those people such as Elizabeth who have
>>> Cocoa texts that matter, and get those converted to TEI;
>>> meanwhile we could retire the conversion that doesn't work,
>>> and remove it from the Stylesheets repo. This is obviously a
>>> question for Council, though.
>>>
>>> Cheers,
>>> Martin
>>>
>>>>
>>>> cheers
>>>> stuart
>>>>
>>
>>
>


--
Dr James Cummings, [hidden email]
Academic IT Services, University of Oxford
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Lou Burnard-6
In reply to this post by Piotr Banski
On 10/01/17 12:26, Piotr Banski wrote:
>
> -- otherwise, I can't see why the Council should devote any time to
> up-conversion from COCOA or any legacy format. This should always,
> IMO, be an issue for a local project (formal or informal: funded or
> performed as e.g. a semester assignment or on a friend-to-friend
> basis). I feel scared by the prospect that Council time (including F2F
> time) might get allocated to such matters, seeing the flood of
> _current_ issues at GitHub.

As I'm no longer on Council, and therefore not talking to myself, can I
say that I entirely agree with Piotr on this! Prioritizing its work load
is always a problem for any overstretched organization so I hope Council
will take on board this recommendation to resist the temptation to
tackle the easy or esoteric, but instead focus on the major problems
which others are unlikely to pick up or be in a position to handle.

L
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Martin Holmes
In reply to this post by James Cummings-4
Hi James,

You're right that we have no mechanisms for deprecation in the
Stylesheets. But we could simply move deprecated code to the TEI wiki,
where there is already a collection of useful XSLT:

<http://wiki.tei-c.org/index.php/Category:XSLT>

Cheers,
Martin

On 2017-01-10 05:07 AM, James Cummings wrote:

> Hi Piotr,
>
> That is partly why I suggested it should be assigned to me (since I know
> the original code).  I agree that Council has *more* than enough work
> keeping up with maintenance of the Guidelines, as well as the
> Stylesheets and other software. This would be very low priority and if I
> didn't have time to do it in the next couple releases I'd note as much
> on the issue, and either I or someone could remove the cocoa-to-xsl
> scripts.  Though I'm curious what should be done when we do so..
> previously we've just moved things around as and when we please or got
> rid of stylesheets , and that is good, but it might be beneficial to
> leave a note as to where these scripts could be found. (i.e. I might
> fork just the cocoa-to-xsl and stick it in my repositories in case it is
> useful to anyone, including me, in the future.) But then do we have to
> maintain that pointer? Maybe best just to delete and leave the existence
> of the closed issue on github as documentation. ;-)
>
> But honest, I'll give it a good 20 minutes of thought in trying to fix
> it --  if I can't in that time, then maybe we'll get rid of it.
>
> -James
>
>
> On 10/01/17 12:26, Piotr Banski wrote:
>> Hi James,
>>
>> I think the only good (or at least acceptable) argument is in your
>> last line (plus your request to get this assigned):
>>
>> > since it used to work I'm suspecting that making its output valid
>> again wouldn't necessarily be too difficult
>>
>> -- otherwise, I can't see why the Council should devote any time to
>> up-conversion from COCOA or any legacy format. This should always,
>> IMO, be an issue for a local project (formal or informal: funded or
>> performed as e.g. a semester assignment or on a friend-to-friend
>> basis). I feel scared by the prospect that Council time (including F2F
>> time) might get allocated to such matters, seeing the flood of
>> _current_ issues at GitHub.
>>
>> Best regards,
>>
>>   Piotr
>>
>>
>>
>>
>> On 01/09/2017 12:57 PM, James Cummings wrote:
>>> Hi Martin,
>>>
>>> Apologies for the delay in answering.  In case anyone is interested
>>> in their history: Yes, that initial cocoa-to-xsl script was indeed
>>> written by me (and then improved substantially into doing multiple
>>> passes by Sebastian with additional debugging output).  My first
>>> attempts at doing so were to make the Cocoa file well-formed XML
>>> (through wrapping a root element around it, changing any individual
>>> tags to be milestone-like psuedo elements) which was done in Perl
>>> before using XSLT, and then running that through two stylesheets, one
>>> to transform the flat cocoa markup language to a flattened TEI markup
>>> language (i.e. rename things) and then the next to fill that
>>> flattened TEI with structure (using xsl:for-each-group).  The
>>> original files were written in 2004, with the express purpose of
>>> being used for a subset of the OTA consisting specifically of verse
>>> drama and I even spoke about this at ALLC-ACH 2004 in Gothenburg. The
>>> are available for now at
>>> http://users.ox.ac.uk/~jamesc/research/cocoa2tei/ where I put them in
>>> early 2006. I was doing this in a very modular way with the intent of
>>> applying to different sorts of texts as well. Partly this was created
>>> to estimate for a (failed) funding bid by the OTA how much work it
>>> would take to convert a large portion of the OTA archives all to TEI
>>> (whilst maintaining the original files of course).  [If I remember
>>> the reviews correctly it failed partly because one reviewer was
>>> outraged that mere technicians might *change* the files lovingly
>>> crafted by academics, even though the bid clearly spelled out that
>>> this was only format conversion and that the original files would
>>> also remain available.  Ah well.] In 2010 as part of a business as
>>> usual drive to migrate some of the OTA texts into TEI P5, my email
>>> archive shows that Sebastian started working on a new cocoa-to-tei
>>> conversion and took my XSLT as the starting point and added the regex
>>> and an extra passes to do it all in a single XSLT while also making
>>> it a bit more general.  I contributed some bits and pieces.  That is
>>> probably why it isn't as generalised as it could be.  But I think it
>>> serves better as an example than a generalised conversion.
>>>
>>> Although the OTA is now in the Bodleian Libraries, not here at IT
>>> Services, I doubt it will ever get rid of its Cocoa texts (even if it
>>> provides migrated versions of them). While it may be easier to just
>>> get rid of the conversion I would certainly be willing to give it a
>>> go in trying to fix it. (i.e. If you've made a github issue, feel
>>> free to assign to me.) As to why we should have and maintain legacy
>>> conversions like this, I would say that it is as an example for those
>>> approaching similar migration up-conversions.  I'm not saying that is
>>> a very good reason, but since it used to work I'm suspecting that
>>> making its output valid again wouldn't necessarily be too difficult.
>>>
>>> -James
>>>
>>> On 07/01/17 18:40, Martin Holmes wrote:
>>>> What I'm actually trying to do is understand the situation better,
>>>> and I'm getting there. This is what I think we've learned:
>>>>
>>>>  - Cocoa is not very amenable to a generic conversion process, since
>>>> its syntax and usage is extremely variable;
>>>>
>>>>  - Our current Cocoa conversion script (James will hopefully confirm
>>>> or refute this) was written specifically to convert certain OTA
>>>> texts, and could not be applied generically;
>>>>
>>>>  - In any case, that script is broken (it produces invalid TEI);
>>>>
>>>>  - There is a possible way forward in writing a conversion that
>>>> creates milestone elements, as suggested by Lou, but that would
>>>> still depend on knowledge of the actual syntax and special
>>>> characters used in any specific set of texts;
>>>>
>>>>  - There are still some Cocoa texts out there which have not yet
>>>> been converted to TEI.
>>>>
>>>> What I take from this is that it would probably make more sense to
>>>> work with those people such as Elizabeth who have Cocoa texts that
>>>> matter, and get those converted to TEI; meanwhile we could retire
>>>> the conversion that doesn't work, and remove it from the Stylesheets
>>>> repo. This is obviously a question for Council, though.
>>>>
>>>> Cheers,
>>>> Martin
>>>>
>>>>>
>>>>> cheers
>>>>> stuart
>>>>>
>>>
>>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cocoa texts and conversions?

Lou Burnard-6
It's not strictly speaking deprecated though, is it? Otherwise it couldn't be considered useful ... quite the reverse in fact! It's just unsupported, or non strategic or minority interest, or something. 




Sent from my Samsung Galaxy Tab®|PRO


-------- Original message --------
From: Martin Holmes
Date:2017/01/10 13:44 (GMT+00:00)
To: [hidden email]
Subject: Re: Cocoa texts and conversions?

Hi James,

You're right that we have no mechanisms for deprecation in the
Stylesheets. But we could simply move deprecated code to the TEI wiki,
where there is already a collection of useful XSLT:

<http://wiki.tei-c.org/index.php/Category:XSLT>

Cheers,
Martin

On 2017-01-10 05:07 AM, James Cummings wrote:
> Hi Piotr,
>
> That is partly why I suggested it should be assigned to me (since I know
> the original code).  I agree that Council has *more* than enough work
> keeping up with maintenance of the Guidelines, as well as the
> Stylesheets and other software. This would be very low priority and if I
> didn't have time to do it in the next couple releases I'd note as much
> on the issue, and either I or someone could remove the cocoa-to-xsl
> scripts.  Though I'm curious what should be done when we do so..
> previously we've just moved things around as and when we please or got
> rid of stylesheets , and that is good, but it might be beneficial to
> leave a note as to where these scripts could be found. (i.e. I might
> fork just the cocoa-to-xsl and stick it in my repositories in case it is
> useful to anyone, including me, in the future.) But then do we have to
> maintain that pointer? Maybe best just to delete and leave the existence
> of the closed issue on github as documentation. ;-)
>
> But honest, I'll give it a good 20 minutes of thought in trying to fix
> it --  if I can't in that time, then maybe we'll get rid of it.
>
> -James
>
>
> On 10/01/17 12:26, Piotr Banski wrote:
>> Hi James,
>>
>> I think the only good (or at least acceptable) argument is in your
>> last line (plus your request to get this assigned):
>>
>> > since it used to work I'm suspecting that making its output valid
>> again wouldn't necessarily be too difficult
>>
>> -- otherwise, I can't see why the Council should devote any time to
>> up-conversion from COCOA or any legacy format. This should always,
>> IMO, be an issue for a local project (formal or informal: funded or
>> performed as e.g. a semester assignment or on a friend-to-friend
>> basis). I feel scared by the prospect that Council time (including F2F
>> time) might get allocated to such matters, seeing the flood of
>> _current_ issues at GitHub.
>>
>> Best regards,
>>
>>   Piotr
>>
>>
>>
>>
>> On 01/09/2017 12:57 PM, James Cummings wrote:
>>> Hi Martin,
>>>
>>> Apologies for the delay in answering.  In case anyone is interested
>>> in their history: Yes, that initial cocoa-to-xsl script was indeed
>>> written by me (and then improved substantially into doing multiple
>>> passes by Sebastian with additional debugging output).  My first
>>> attempts at doing so were to make the Cocoa file well-formed XML
>>> (through wrapping a root element around it, changing any individual
>>> tags to be milestone-like psuedo elements) which was done in Perl
>>> before using XSLT, and then running that through two stylesheets, one
>>> to transform the flat cocoa markup language to a flattened TEI markup
>>> language (i.e. rename things) and then the next to fill that
>>> flattened TEI with structure (using xsl:for-each-group).  The
>>> original files were written in 2004, with the express purpose of
>>> being used for a subset of the OTA consisting specifically of verse
>>> drama and I even spoke about this at ALLC-ACH 2004 in Gothenburg. The
>>> are available for now at
>>> http://users.ox.ac.uk/~jamesc/research/cocoa2tei/ where I put them in
>>> early 2006. I was doing this in a very modular way with the intent of
>>> applying to different sorts of texts as well. Partly this was created
>>> to estimate for a (failed) funding bid by the OTA how much work it
>>> would take to convert a large portion of the OTA archives all to TEI
>>> (whilst maintaining the original files of course).  [If I remember
>>> the reviews correctly it failed partly because one reviewer was
>>> outraged that mere technicians might *change* the files lovingly
>>> crafted by academics, even though the bid clearly spelled out that
>>> this was only format conversion and that the original files would
>>> also remain available.  Ah well.] In 2010 as part of a business as
>>> usual drive to migrate some of the OTA texts into TEI P5, my email
>>> archive shows that Sebastian started working on a new cocoa-to-tei
>>> conversion and took my XSLT as the starting point and added the regex
>>> and an extra passes to do it all in a single XSLT while also making
>>> it a bit more general.  I contributed some bits and pieces.  That is
>>> probably why it isn't as generalised as it could be.  But I think it
>>> serves better as an example than a generalised conversion.
>>>
>>> Although the OTA is now in the Bodleian Libraries, not here at IT
>>> Services, I doubt it will ever get rid of its Cocoa texts (even if it
>>> provides migrated versions of them). While it may be easier to just
>>> get rid of the conversion I would certainly be willing to give it a
>>> go in trying to fix it. (i.e. If you've made a github issue, feel
>>> free to assign to me.) As to why we should have and maintain legacy
>>> conversions like this, I would say that it is as an example for those
>>> approaching similar migration up-conversions.  I'm not saying that is
>>> a very good reason, but since it used to work I'm suspecting that
>>> making its output valid again wouldn't necessarily be too difficult.
>>>
>>> -James
>>>
>>> On 07/01/17 18:40, Martin Holmes wrote:
>>>> What I'm actually trying to do is understand the situation better,
>>>> and I'm getting there. This is what I think we've learned:
>>>>
>>>>  - Cocoa is not very amenable to a generic conversion process, since
>>>> its syntax and usage is extremely variable;
>>>>
>>>>  - Our current Cocoa conversion script (James will hopefully confirm
>>>> or refute this) was written specifically to convert certain OTA
>>>> texts, and could not be applied generically;
>>>>
>>>>  - In any case, that script is broken (it produces invalid TEI);
>>>>
>>>>  - There is a possible way forward in writing a conversion that
>>>> creates milestone elements, as suggested by Lou, but that would
>>>> still depend on knowledge of the actual syntax and special
>>>> characters used in any specific set of texts;
>>>>
>>>>  - There are still some Cocoa texts out there which have not yet
>>>> been converted to TEI.
>>>>
>>>> What I take from this is that it would probably make more sense to
>>>> work with those people such as Elizabeth who have Cocoa texts that
>>>> matter, and get those converted to TEI; meanwhile we could retire
>>>> the conversion that doesn't work, and remove it from the Stylesheets
>>>> repo. This is obviously a question for Council, though.
>>>>
>>>> Cheers,
>>>> Martin
>>>>
>>>>>
>>>>> cheers
>>>>> stuart
>>>>>
>>>
>>>
>>
>
>
123
Loading...