A question about encoding and text fixing platforms for undergraduate curators

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

A question about encoding and text fixing platforms for undergraduate curators

Martin Mueller

I’m not sure whether this is a sensible question to ask, but I’ll ask it anyhow.

 

This summer we’ll have a number of undergraduate curators of TCP texts fixing this and that, mainly incompletely transcribed words, but sometimes longer stretches of text or whole pages.

 

So we’ll some transcription platform in addition to an eXist site at http://shc.earlprint.org,  where single words can be fixed by changing the value of the content to of a <w> element, mercifully invisible to the user.

 

If you believe that the best tool is the tool you know best, you’d try to figure out whether undergraduates could do this work using Microsoft Word with a set of styles that subsequently support the automatic transformation of the Microsoft word passages into XML fragments that can be fitted into the TCP transcriptions.

 

Is that a plausible scenario and has something like that been done? TCP encoding is quite sparse. Text is either marked (inside <hi>) or unmarked, and the transcription is silent about what the unmarked state is.  My rough guess is that a dozen elements will cover the vast majority of cases.

 

The Folger Library has an attractive Web-based tool for manuscript transcription that can probably be adjusted with little trouble.

 

The students will be in residence for six weeks, and it may be that we should teach them encoding with oXygen. Some of them may love it, others may hate it.

 

I’d be grateful for advice and practical war stories about what does and does not work.

Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Elisa Beshero-Bondar-2
HI Martin,
If you have students coming for **six weeks** to work at the Folger, why not orient them to writing and working the code? I work regularly with undergrads, and it takes about one week for them to orient themselves to angle brackets and oXygen. Also, I don’t believe that most undergrads “know” Microsoft Word especially well. Why create extra work with making Microsoft styles force-fit to plain simple sparse code?

In my experience (regularly, from Fall 2012 onward), students take like ducks to the water and tend to appreciate working directly with the angle bracket code. Seems to me a six week institute is a perfect opportunity to teach them to code as part of the experience! 

Elisa
-- 
Elisa Beshero-Bondar, PhD
Director, Center for the Digital Text | Associate Professor of English
University of Pittsburgh at Greensburg | Humanities Division
150 Finoli Drive
Greensburg, PA  15601  USA
E-mail:[hidden email]
Development site: http://newtfire.org






On Apr 26, 2017, at 7:09 PM, Martin Mueller <[hidden email]> wrote:

I’m not sure whether this is a sensible question to ask, but I’ll ask it anyhow.
 
This summer we’ll have a number of undergraduate curators of TCP texts fixing this and that, mainly incompletely transcribed words, but sometimes longer stretches of text or whole pages.
 
So we’ll some transcription platform in addition to an eXist site at http://shc.earlprint.org,  where single words can be fixed by changing the value of the content to of a <w> element, mercifully invisible to the user.
 
If you believe that the best tool is the tool you know best, you’d try to figure out whether undergraduates could do this work using Microsoft Word with a set of styles that subsequently support the automatic transformation of the Microsoft word passages into XML fragments that can be fitted into the TCP transcriptions.
 
Is that a plausible scenario and has something like that been done? TCP encoding is quite sparse. Text is either marked (inside <hi>) or unmarked, and the transcription is silent about what the unmarked state is.  My rough guess is that a dozen elements will cover the vast majority of cases.
 
The Folger Library has an attractive Web-based tool for manuscript transcription that can probably be adjusted with little trouble.
 
The students will be in residence for six weeks, and it may be that we should teach them encoding with oXygen. Some of them may love it, others may hate it.
 
I’d be grateful for advice and practical war stories about what does and does not work.

Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Martin Mueller

They’re not working at the Folger. They’ll be working at Northwestern, though using the Web-based Folger platform is a possible option. You’re right in saying that the students may not know Word all that well.  I would like to believe that students take to angle brackets like ducks to water and for what it’s worth, I much prefer working with the text mode in oXygen and haven’t found much use for the author mode. Still, I’ll need more persuading on the ducks to water claim.

 

 

From: Elisa Beshero-Bondar <[hidden email]>
Date: Wednesday, April 26, 2017 at 7:10 PM
To: Martin Mueller <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: A question about encoding and text fixing platforms for undergraduate curators

 

HI Martin,

If you have students coming for **six weeks** to work at the Folger, why not orient them to writing and working the code? I work regularly with undergrads, and it takes about one week for them to orient themselves to angle brackets and oXygen. Also, I don’t believe that most undergrads “know” Microsoft Word especially well. Why create extra work with making Microsoft styles force-fit to plain simple sparse code?

 

In my experience (regularly, from Fall 2012 onward), students take like ducks to the water and tend to appreciate working directly with the angle bracket code. Seems to me a six week institute is a perfect opportunity to teach them to code as part of the experience! 

 

Elisa

-- 
Elisa Beshero-Bondar, PhD
Director, Center for the Digital Text | Associate Professor of English
University of Pittsburgh at Greensburg | Humanities Division
150 Finoli Drive
Greensburg, PA  15601  USA
E-mail:[hidden email]
Development site: http://newtfire.org

 

 

 

 

 

On Apr 26, 2017, at 7:09 PM, Martin Mueller <[hidden email]> wrote:

 

I’m not sure whether this is a sensible question to ask, but I’ll ask it anyhow.

 

This summer we’ll have a number of undergraduate curators of TCP texts fixing this and that, mainly incompletely transcribed words, but sometimes longer stretches of text or whole pages.

 

So we’ll some transcription platform in addition to an eXist site at http://shc.earlprint.org,  where single words can be fixed by changing the value of the content to of a <w> element, mercifully invisible to the user.

 

If you believe that the best tool is the tool you know best, you’d try to figure out whether undergraduates could do this work using Microsoft Word with a set of styles that subsequently support the automatic transformation of the Microsoft word passages into XML fragments that can be fitted into the TCP transcriptions.

 

Is that a plausible scenario and has something like that been done? TCP encoding is quite sparse. Text is either marked (inside <hi>) or unmarked, and the transcription is silent about what the unmarked state is.  My rough guess is that a dozen elements will cover the vast majority of cases.

 

The Folger Library has an attractive Web-based tool for manuscript transcription that can probably be adjusted with little trouble.

 

The students will be in residence for six weeks, and it may be that we should teach them encoding with oXygen. Some of them may love it, others may hate it.

 

I’d be grateful for advice and practical war stories about what does and does not work.

 

Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Syd Bauman-10
Martin -- For encouragement using MS Word as an XML editor and advice
on how to do so ask Laura Mandell from Texas A&M. The rest of us will
try to warn you off the idea.

I, for one, think you would be doing the undergraduates you hire a
grave disservice not to spend the 1/2 day it will take to teach them
XML, oXygen, and enough TEI to do the work you need.[1] For some it
will be a skill they just use to do the job you've hired them for
better. For some it will prove mildly useful at cocktail parties and
perhaps when dealing with tech support. For a few, though, it will be
a life-long skill that will prove invaluable over and over again.

Notes
-----
[1] And you should fee free to use the WWP teaching materials. E.g.,
    see
    http://www.wwp.neu.edu/outreach/seminars/_current/presentations/xml_intro/xml_newIntro_tutorial_00.xhtml
    or, just slides w/o the tutorial notes at
    http://www.wwp.neu.edu/outreach/seminars/_current/presentations/xml_intro/xml_newIntro_00.xhtml
Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Martin Mueller
When it comes to undergraduates on grant-funded summer research projects and the cost/benefit analysis of  how to manage work, I am very sensitive to giving appropriate weight to  the long-term benefit of students.

On 4/26/17, 7:55 PM, "TEI (Text Encoding Initiative) public discussion list on behalf of Syd Bauman" <[hidden email] on behalf of [hidden email]> wrote:

    Martin -- For encouragement using MS Word as an XML editor and advice
    on how to do so ask Laura Mandell from Texas A&M. The rest of us will
    try to warn you off the idea.
   
    I, for one, think you would be doing the undergraduates you hire a
    grave disservice not to spend the 1/2 day it will take to teach them
    XML, oXygen, and enough TEI to do the work you need.[1] For some it
    will be a skill they just use to do the job you've hired them for
    better. For some it will prove mildly useful at cocktail parties and
    perhaps when dealing with tech support. For a few, though, it will be
    a life-long skill that will prove invaluable over and over again.
   
    Notes
    -----
    [1] And you should fee free to use the WWP teaching materials. E.g.,
        see
        https://urldefense.proofpoint.com/v2/url?u=http-3A__www.wwp.neu.edu_outreach_seminars_-5Fcurrent_presentations_xml-5Fintro_xml-5FnewIntro-5Ftutorial-5F00.xhtml&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=SfabIvtYshezTj8EQRlBabq1DHQpdBSbnPyQLbFbSUI&s=nDwn0wdtU9qhquGVyBOxPU4FUzirDEFkESgLX9g37lE&e= 
        or, just slides w/o the tutorial notes at
        https://urldefense.proofpoint.com/v2/url?u=http-3A__www.wwp.neu.edu_outreach_seminars_-5Fcurrent_presentations_xml-5Fintro_xml-5FnewIntro-5F00.xhtml&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=SfabIvtYshezTj8EQRlBabq1DHQpdBSbnPyQLbFbSUI&s=enKpQL8T-A_eUPfVSRBAU4B41RrJlxahcZAj38OsZ7Q&e= 
   

Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Elisa Beshero-Bondar-2
In reply to this post by Syd Bauman-10
Absolutely agreed with Syd. And I’m sorry if my three or four years of field experience in teaching undergrads at Pitt-Greensburg fails to convince you about the “ducks to water” claim, but I will repeat it, and even indicate that my “one-week” period is exaggerated. It really takes, as Syd indicates, some *hours*. I believe it takes an overnight homework assignment that orients them to coding. I’ve never had trouble with teaching XML—and when students tend to feel overwhelmed with my coding course it’s much later—perhaps with schema writing (though not typically)—it’s usually when we get to writing regular expressions and up-converting plain text to XML. Then it becomes something of an obsessive-compulsive video-game experience, I think, and they get frustrated when the expressions they try don’t work or make ill-formed code. They tend to get befuddled for maybe a week in figuring out how to write template matches in XSLT, but they get over that, too. 

I’ve seen students get stuck and lost with coding, but it’s not with the XML writing part of it. They even enjoy XPath. Really. Give those undergrads a try—they *do* like to learn to code, and it seems a shame not to give them the opportunity when you have them for 6 weeks working on TEI files. 

Elisa
-- 
Elisa Beshero-Bondar, PhD
Director, Center for the Digital Text | Associate Professor of English
University of Pittsburgh at Greensburg | Humanities Division
150 Finoli Drive
Greensburg, PA  15601  USA
E-mail:[hidden email]
Development site: http://newtfire.org






On Apr 26, 2017, at 8:55 PM, Syd Bauman <[hidden email]> wrote:

Martin -- For encouragement using MS Word as an XML editor and advice
on how to do so ask Laura Mandell from Texas A&M. The rest of us will
try to warn you off the idea.

I, for one, think you would be doing the undergraduates you hire a
grave disservice not to spend the 1/2 day it will take to teach them
XML, oXygen, and enough TEI to do the work you need.[1] For some it
will be a skill they just use to do the job you've hired them for
better. For some it will prove mildly useful at cocktail parties and
perhaps when dealing with tech support. For a few, though, it will be
a life-long skill that will prove invaluable over and over again.

Notes
-----
[1] And you should fee free to use the WWP teaching materials. E.g.,
   see
   http://www.wwp.neu.edu/outreach/seminars/_current/presentations/xml_intro/xml_newIntro_tutorial_00.xhtml
   or, just slides w/o the tutorial notes at
   http://www.wwp.neu.edu/outreach/seminars/_current/presentations/xml_intro/xml_newIntro_00.xhtml

Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Nesvet, Rebecca
> Agree with ELisa. My undergraduate English Lit students... about 40 in 2 semesters, encoding and marking up 43 chapters of Victorian fiction ... have also learned angle brackets in one day, as well as things like &#8211; versus &#8212; . It's vocab, and they have been doing vocab since elementary school. Having them handwrite a "vocab list" of code seems to root in in their minds. Or maybe it's Elisa's pedagogy, as I learned coding in her workshop.
Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Paterson, Duncan
In reply to this post by Martin Mueller
Hello Martin, 

i agree with the others that MS Word might not be the best choice. Why not use a custom plugin for oXygen’s author via plugin-builder. Customising it for the projects needs (see here) was very quick and painless. In combination with Github we have thus created a dual stream proofreading environment. From zero to Github and xml took students hours instead of days. 

You can see my instructions to the students (with just a  single proofing stream) here.

The good: 
- Allow students familiar with xml to work in the environment of their choice (even WORD or EXCEL). 
- Use oxygen author for UI only editing, while limiting the amount of *damage* that students can do.
- Let Github handle the highlighting of conflicts, and progress tracking. 
- consistent markup
- very fast results

The bad
- not much
- very first round of pull requests are bit of a mess.
- installing software can be hard.
- about two full days to of prep work splitting xml into editable chunks, adding icons to the plugin and some css. 
- i don’t like author mode

Greetings
Duncan





On 27. Apr 2017, at 01:09, Martin Mueller <[hidden email]> wrote:

I’m not sure whether this is a sensible question to ask, but I’ll ask it anyhow.
 
This summer we’ll have a number of undergraduate curators of TCP texts fixing this and that, mainly incompletely transcribed words, but sometimes longer stretches of text or whole pages.
 
So we’ll some transcription platform in addition to an eXist site at http://shc.earlprint.org,  where single words can be fixed by changing the value of the content to of a <w> element, mercifully invisible to the user.
 
If you believe that the best tool is the tool you know best, you’d try to figure out whether undergraduates could do this work using Microsoft Word with a set of styles that subsequently support the automatic transformation of the Microsoft word passages into XML fragments that can be fitted into the TCP transcriptions.
 
Is that a plausible scenario and has something like that been done? TCP encoding is quite sparse. Text is either marked (inside <hi>) or unmarked, and the transcription is silent about what the unmarked state is.  My rough guess is that a dozen elements will cover the vast majority of cases.
 
The Folger Library has an attractive Web-based tool for manuscript transcription that can probably be adjusted with little trouble.
 
The students will be in residence for six weeks, and it may be that we should teach them encoding with oXygen. Some of them may love it, others may hate it.
 
I’d be grateful for advice and practical war stories about what does and does not work.

Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Stephen H. Gregg
In reply to this post by Nesvet, Rebecca
Dear Martin
my own English Literature students took to working with XML in Oxygen very quickly. I had them doing some basic markup on a poem in 3 hours - thanks to James Cummings' worksheets! I also agree that getting them to work via Word seems a long way around and perhaps unnecessarily complicated.

Hope that helps,
Stephen
(my first post to this list ...) 

On 27 April 2017 at 02:54, Nesvet, Rebecca <[hidden email]> wrote:
> Agree with ELisa. My undergraduate English Lit students... about 40 in 2 semesters, encoding and marking up 43 chapters of Victorian fiction ... have also learned angle brackets in one day, as well as things like &#8211; versus &#8212; . It's vocab, and they have been doing vocab since elementary school. Having them handwrite a "vocab list" of code seems to root in in their minds. Or maybe it's Elisa's pedagogy, as I learned coding in her workshop.



--
Dr Stephen H. Gregg, FHEA
Senior Lecturer in English Literature
Twitter: @gregg_sh

T: +44 (0)1225 875482  M: +44 (0)7771 702912
Visit
www.bathspa.ac.uk
Join us on: Facebook | Twitter | YouTube | LinkedIn
Newton Park, Bath, BA2 9BN

Think before you print


Disclaimer

If you have received this message in error, please notify us and remove it from your system. Any views or opinions expressed in personal emails are solely those of the author and do not necessarily represent those of Bath Spa University. Neither Bath Spa University nor the sender accepts any responsibility for viruses and it is your responsibility to scan this email and any attachments for viruses.

Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Peter Flynn-8
In reply to this post by Elisa Beshero-Bondar-2
On 04/27/2017 02:06 AM, Elisa Beshero-Bondar wrote:
> Absolutely agreed with Syd.

No question about it. I would make them all learn oXygen: and then I
would let the clever ones learn Emacs with psgml-mode as an honour, and
nxml-mode as a treat.¹

At worst, they will learn that there is an entire universe out there
outside the Microsoft bubble; at best, they will become markup addicts
and join our community here.

///Peter
--
¹ Thank you W Churchill.
Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Martin Holmes
In reply to this post by Martin Mueller
+1 for the ducks-to-water claim from me. We've been teaching students
XML for years, and I've never had a single instance where a student
struggled with it at all. It takes a matter of a couple of hours to get
people into basic encoding.

XML is not hard. Word, by contrast, is a concoction of frustrations, and
getting DOCX into decent TEI when you're done is horribly difficult.

Cheers,
Martin

On 2017-04-26 05:23 PM, Martin Mueller wrote:

> They’re not working at the Folger. They’ll be working at Northwestern,
> though using the Web-based Folger platform is a possible option. You’re
> right in saying that the students may not know Word all that well.  I
> would like to believe that students take to angle brackets like ducks to
> water and for what it’s worth, I much prefer working with the text mode
> in oXygen and haven’t found much use for the author mode. Still, I’ll
> need more persuading on the ducks to water claim.
>
>
>
>
>
> *From: *Elisa Beshero-Bondar <[hidden email]>
> *Date: *Wednesday, April 26, 2017 at 7:10 PM
> *To: *Martin Mueller <[hidden email]>
> *Cc: *"[hidden email]" <[hidden email]>
> *Subject: *Re: A question about encoding and text fixing platforms for
> undergraduate curators
>
>
>
> HI Martin,
>
> If you have students coming for **six weeks** to work at the Folger, why
> not orient them to writing and working the code? I work regularly with
> undergrads, and it takes about one week for them to orient themselves to
> angle brackets and oXygen. Also, I don’t believe that most undergrads
> “know” Microsoft Word especially well. Why create extra work with making
> Microsoft styles force-fit to plain simple sparse code?
>
>
>
> In my experience (regularly, from Fall 2012 onward), students take like
> ducks to the water and tend to appreciate working directly with the
> angle bracket code. Seems to me a six week institute is a perfect
> opportunity to teach them to code as part of the experience!
>
>
>
> Elisa
>
> --
> Elisa Beshero-Bondar, PhD
> Director, Center for the Digital Text | Associate Professor of English
> University of Pittsburgh at Greensburg | Humanities Division
> 150 Finoli Drive
> Greensburg, PA  15601  USA
> E-mail: [hidden email] <mailto:[hidden email]>
> Development site: http://newtfire.org
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__newtfire.org&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=0N_RpCVwzbACojeKb6dxzlY1NOGShq0zdjRJGs8TXZs&s=WyRipKNOZ5-njOhfp8fI4VMyK3oV4rHCqhyHXKumRac&e=>
>
>
>
>
>
>
>
>
>
>
>
>
>     On Apr 26, 2017, at 7:09 PM, Martin Mueller
>     <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>
>
>     I’m not sure whether this is a sensible question to ask, but I’ll
>     ask it anyhow.
>
>
>
>     This summer we’ll have a number of undergraduate curators of TCP
>     texts fixing this and that, mainly incompletely transcribed words,
>     but sometimes longer stretches of text or whole pages.
>
>
>
>     So we’ll some transcription platform in addition to an eXist site
>     at http://shc.earlprint.org
>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__shc.earlprint.org_&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=0N_RpCVwzbACojeKb6dxzlY1NOGShq0zdjRJGs8TXZs&s=eaGdmgiKyJYQbrj6Usf_L8n1Pyh63t6FzbkZzAX7BLc&e=>,
>     where single words can be fixed by changing the value of the content
>     to of a <w> element, mercifully invisible to the user.
>
>
>
>     If you believe that the best tool is the tool you know best, you’d
>     try to figure out whether undergraduates could do this work using
>     Microsoft Word with a set of styles that subsequently support the
>     automatic transformation of the Microsoft word passages into XML
>     fragments that can be fitted into the TCP transcriptions.
>
>
>
>     Is that a plausible scenario and has something like that been done?
>     TCP encoding is quite sparse. Text is either marked (inside <hi>) or
>     unmarked, and the transcription is silent about what the unmarked
>     state is.  My rough guess is that a dozen elements will cover the
>     vast majority of cases.
>
>
>
>     The Folger Library has an attractive Web-based tool for manuscript
>     transcription that can probably be adjusted with little trouble.
>
>
>
>     The students will be in residence for six weeks, and it may be that
>     we should teach them encoding with oXygen. Some of them may love it,
>     others may hate it.
>
>
>
>     I’d be grateful for advice and practical war stories about what does
>     and does not work.
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

James Cummings-4
In reply to this post by Elisa Beshero-Bondar-2
Hi Elisa,

I agree as well. I think it all depends on the project. For the
William Godwin Diary project
(http://godwindiary.bodleian.ox.ac.uk/index2.html) the 3 students
doing the encoding had 1.5 days of training which included
explanation of subversion as well.  Yes, they occasionally asked
questions but mostly got on with really quite rich and deep
encoding. They were all literature or politics grads with no
previous technical training.

Maybe it all comes down to the teaching. ;-)

-James

On 27/04/17 02:06, Elisa Beshero-Bondar wrote:

> Absolutely agreed with Syd. And I’m sorry if my three or four
> years of field experience in teaching undergrads at
> Pitt-Greensburg fails to convince you about the “ducks to
> water” claim, but I will repeat it, and even indicate that my
> “one-week” period is exaggerated. It really takes, as Syd
> indicates, some *hours*. I believe it takes an overnight
> homework assignment that orients them to coding. I’ve never had
> trouble with teaching XML—and when students tend to feel
> overwhelmed with my coding course it’s much later—perhaps with
> schema writing (though not typically)—it’s usually when we get
> to writing regular expressions and up-converting plain text to
> XML. Then it becomes something of an obsessive-compulsive
> video-game experience, I think, and they get frustrated when
> the expressions they try don’t work or make ill-formed
> code. They tend to get befuddled for maybe a week in figuring
> out how to write template matches in XSLT, but they get over
> that, too.
>
> I’ve seen students get stuck and lost with coding, but it’s not
> with the XML writing part of it. They even enjoy XPath. Really.
> Give those undergrads a try—they *do* like to learn to code,
> and it seems a shame not to give them the opportunity when you
> have them for 6 weeks working on TEI files.
>
> Elisa
> --
> Elisa Beshero-Bondar, PhD
> Director, Center for the Digital Text | Associate Professor of
> English
> University of Pittsburgh at Greensburg | Humanities Division
> 150 Finoli Drive
> Greensburg, PA  15601  USA
> E-mail: [hidden email] <mailto:[hidden email]>
> Development site: http://newtfire.org
>
>
>
>
>
>
>> On Apr 26, 2017, at 8:55 PM, Syd Bauman
>> <[hidden email] <mailto:[hidden email]>>
>> wrote:
>>
>> Martin -- For encouragement using MS Word as an XML editor and
>> advice
>> on how to do so ask Laura Mandell from Texas A&M. The rest of
>> us will
>> try to warn you off the idea.
>>
>> I, for one, think you would be doing the undergraduates you hire a
>> grave disservice not to spend the 1/2 day it will take to
>> teach them
>> XML, oXygen, and enough TEI to do the work you need.[1] For
>> some it
>> will be a skill they just use to do the job you've hired them for
>> better. For some it will prove mildly useful at cocktail
>> parties and
>> perhaps when dealing with tech support. For a few, though, it
>> will be
>> a life-long skill that will prove invaluable over and over again.
>>
>> Notes
>> -----
>> [1] And you should fee free to use the WWP teaching materials.
>> E.g.,
>>    see
>> http://www.wwp.neu.edu/outreach/seminars/_current/presentations/xml_intro/xml_newIntro_tutorial_00.xhtml
>>    or, just slides w/o the tutorial notes at
>> http://www.wwp.neu.edu/outreach/seminars/_current/presentations/xml_intro/xml_newIntro_00.xhtml
>


--
Dr James Cummings, [hidden email]
Academic IT Services, University of Oxford
Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Birnbaum, David J
Maybe it all comes down to the teaching. ;-)


+1
Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Lou Burnard-6
What he said.

Sent from my Huawei Mobile

-------- Original Message --------
Subject: Re: A question about encoding and text fixing platforms for undergraduate curators
From: "Birnbaum, David J"
To: [hidden email]
CC:

Maybe it all comes down to the teaching. ;-)


+1
Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Sewell, David R. (drs2n)
In reply to this post by Martin Holmes
On Thu, 27 Apr 2017, Martin Holmes wrote:

[...]
> XML is not hard. Word, by contrast, is a concoction of frustrations, and
> getting DOCX into decent TEI when you're done is horribly difficult.

If you create a Word template with styles (whether built-in or custom) that are
sufficient to define each block-level and inline element that needs to be
expressed in XML, then it's certainly feasible to set up a workflow involving a
translation tool like oXgarage. It mainly requires careful analysis of the
result of oXgarage conversion in order to create a further XSLT transform to
produce your desired final output. But in our experience (based on using such a
workflow with distributed authors for a reference project), it's going to be
almost inevitable that now and then the Word file you receive is going to have
some kind of unexpected crud in it resulting from the user adding an unforeseen
style, accidentally changing a format, or doing anything unpredictable. It is
probably easier to teach people to use a consistent set of markup conventions in
oXygen than to get them to use MS Word with 100% accuracy according to whatever
template and style rules you want them to follow.

David

--
David Sewell
Manager of Digital Initiatives
The University of Virginia Press
Email: [hidden email]   Tel: +1 434 924 9973
Web: http://www.upress.virginia.edu/rotunda
Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

Martin Mueller
In reply to this post by Birnbaum, David J
Perhaps it does, but sometimes I’m tempted to adapt Margaret Thatcher’s “There is no such thing as society” to “There is no such thing as teaching, there is only learning.”  

That said, I’m persuaded by the chorus of good advice, which pushed me in a direction that I’m quite happy to be pushed in.

On 4/27/17, 8:32 AM, "TEI (Text Encoding Initiative) public discussion list on behalf of Birnbaum, David J" <[hidden email] on behalf of [hidden email]> wrote:

    Maybe it all comes down to the teaching. ;-)
   
   
    +1
   

Reply | Threaded
Open this post in threaded view
|

Re: A question about encoding and text fixing platforms for undergraduate curators

ron.vandenbranden
Administrator
In reply to this post by Sewell, David R. (drs2n)
Hi,

I've been struggling with such issues the last couple of weeks. We're
partner in a correspondence edition project which has engaged a limited
number of (elderly) volunteers who in a first phase will be checking
existing transcriptions (in DOCX format), and possibly enrich them with
more information. In order to stay as close as possible to both the
volunteers' comfort zone and the existing Word files, we've started a
first editing round in Word for which we've defined a limited number of
Word styles, mainly to identify implicit text structures. Such styles
will aid in a subsequent conversion to XML to get decent minimally
structured TEI out of it. For later editing rounds, I /hope/ to get
(some of) the volunteers comfortable with a tightly configured graphical
XML editing environment. Since Oxygen Web Author was out of reach
budget-wise, I'm exploring XMLMind (http://www.xmlmind.com/xmleditor/),
which has a number of pluses for this project (projects like ours are
covered by the free license, it's highly configurable, can work with
remote files on Google Drive, can be offered fully (and hence centrally)
configured as a downloadable program that doesn't need installation).

Introducing the volunteer group to a more structured way of "styling"
Word documents proved challenging enough, but I hope this way of
approaching the transcriptions can function as an introduction to
working in a more structured graphical XML editing environment, with the
added trigger that such an environment could unleash their full
potential for enriching the transcriptions with richer information
(annotations, named entities, additions / deletions / unclear readings,
...) they'll be craving to add but don't have the means for in Word (see
below). Additionally, since restructuring existing XML structures is IMO
one of the hardest parts for novices (who in this case aren't interested
in these encoding aspects anyway), the "structuring" phase in Word
should allow us  to derive properly structured XML that will allow them
to concentrate on further enrichment with more information. With proper
configuration of the editor and if all goes well, this could then take
the form of selecting text and applying the right action ("mark as
deletion"), much like applying styles in Word.

Of course, the resulting TEI texts will be proofed and edited further by
project staff who will be working with the actual XML code, most
probably in Oxygen.

On 27/04/2017 15:42, Sewell, David R. (drs2n) wrote:

> On Thu, 27 Apr 2017, Martin Holmes wrote:
>
> [...]
>> XML is not hard. Word, by contrast, is a concoction of frustrations,
>> and getting DOCX into decent TEI when you're done is horribly difficult.
>
> If you create a Word template with styles (whether built-in or custom)
> that are sufficient to define each block-level and inline element that
> needs to be expressed in XML, then it's certainly feasible to set up a
> workflow involving a translation tool like oXgarage. It mainly
> requires careful analysis of the result of oXgarage conversion in
> order to create a further XSLT transform to produce your desired final
> output.

I agree, as long as there's no overlap in the text structures or
phenomena you want to express with Word styles, since Word styles don't
nest within other styles of the same level (paragraph or character). For
example, if you have defined two paragraph-level styles for indicating
verse lines and block quotations, you can't combine them for marking a
verse line inside a block quotation, since only 1 paragraph-level style
can be applied at the same time. Equally, if you have defined two
character-level styles for additions and deletions, and try to mark a
deletion inside an added text fragment, Word will instead fragment this
into [first bit of text with style for addition] [text with style for
deletion] [rest of text with style for addition]. You'll end up with a
flat sequence of separate styles, which after conversion will translate
into:

   <add>first bit of text with style for addition</add>
   <del>text with style for deletion</del>
   <add>rest of text with style for addition</add>

...where instead what you really want to express is:

   <add>first bit of text with style for addition
     <del>text with style for deletion</del>
   rest of text with style for addition</add>

I don't think it's possible to up-convert such fragments (in all their
possible combinations) automatically in a meaningful way. The resulting
TEI text would always need thorough checking and restructuring, which
would cause extra work instead of gaining time. Even creating separate
"synthetic" styles for such combinations (e.g. deletion-within-addition,
verseLine-within-quotation, ...) would quickly become unwieldy (of
course, all kinds of structures can nest in all kinds of combinations
and levels) and merely complicate the Word step without substantially
improving the resulting XML.

If my hopes come true, this could provide a pragmatic workflow for this
project, where a Word step could both be useful and function as a
didactic means towards a more structured editing environment.

Best,

Ron