Quantcast

Editing Arabic TEI

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Editing Arabic TEI

Birnbaum, David J
Dear TEI-L,

I don't work with Arabic texts myself, but some of my students and
colleagues do, and one has just asked me to recommend an XML editor. I use
Oxygen for all of my own work, and I've watched the video that SyncroSoft
produced about editing Arabic in the Oxygen Author view, but before I
point my colleague in that and only that direction, I wanted to ask what
others have used to edit Arabic TEI documents. When I've poked at Arabic
in Oxygen before (older versions, though, so the following may no longer
be the case), the Text view sometimes stranded angle brackets in the wrong
place. The Author view obviously (no angle brackets) didn't do that, but
it broke onto new lines in places that made sense from an engineering
perspective, but that made the continuous text harder to read. All in all,
it was usable, but should my colleague also be considering alternatives?

Thanks,

David
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Editing Arabic TEI

Ondine LeBlanc
At the 2015 Institute for Editing Historical Documents, we had a participant who is working on editing (transliterating and translating) Ottoman legal texts, and his plans now include a digital edition. His name is Safa Saraçoğlu, and he presented on his project at the Association for Documentary Editing meeting last summer. I would be surprised if he hasn't now had a good bit of experience with this. I believe he is still at Bloomsburg University: http://bloomu.edu/saracoglu

Ondine

-----Original Message-----
From: TEI (Text Encoding Initiative) public discussion list [mailto:[hidden email]] On Behalf Of Birnbaum, David J
Sent: Thursday, February 02, 2017 4:35 PM
To: [hidden email]
Subject: Editing Arabic TEI

Dear TEI-L,

I don't work with Arabic texts myself, but some of my students and colleagues do, and one has just asked me to recommend an XML editor. I use Oxygen for all of my own work, and I've watched the video that SyncroSoft produced about editing Arabic in the Oxygen Author view, but before I point my colleague in that and only that direction, I wanted to ask what others have used to edit Arabic TEI documents. When I've poked at Arabic in Oxygen before (older versions, though, so the following may no longer be the case), the Text view sometimes stranded angle brackets in the wrong place. The Author view obviously (no angle brackets) didn't do that, but it broke onto new lines in places that made sense from an engineering perspective, but that made the continuous text harder to read. All in all, it was usable, but should my colleague also be considering alternatives?

Thanks,

David
--
Turning Points in American History is on display at the MHS Monday through Saturday from 10 AM to 4 PM through 25 February. . More information is available at www.masshist.org.

Ondine LeBlanc, Director of Publications
Massachusetts Historical Society
1154 Boylston Street, Boston, MA 02215
Phone: 617-646-0524, Fax: 617-859-0074
Email: [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Editing Arabic TEI

Ondine LeBlanc
Okay, just need to add that I am not a fraud, really. This message is about XML and oXygen and Arabic characters. Really. Maybe the detector doesn't like the Turkish name?

-----Original Message-----
From: TEI (Text Encoding Initiative) public discussion list [mailto:[hidden email]] On Behalf Of Ondine LeBlanc
Sent: Thursday, February 02, 2017 4:50 PM
To: [hidden email]
Subject: Re: Editing Arabic TEI

[This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]

At the 2015 Institute for Editing Historical Documents, we had a participant who is working on editing (transliterating and translating) Ottoman legal texts, and his plans now include a digital edition. His name is Safa Saraçoğlu, and he presented on his project at the Association for Documentary Editing meeting last summer. I would be surprised if he hasn't now had a good bit of experience with this. I believe he is still at Bloomsburg University: http://bloomu.edu/saracoglu

Ondine

-----Original Message-----
From: TEI (Text Encoding Initiative) public discussion list [mailto:[hidden email]] On Behalf Of Birnbaum, David J
Sent: Thursday, February 02, 2017 4:35 PM
To: [hidden email]
Subject: Editing Arabic TEI

Dear TEI-L,

I don't work with Arabic texts myself, but some of my students and colleagues do, and one has just asked me to recommend an XML editor. I use Oxygen for all of my own work, and I've watched the video that SyncroSoft produced about editing Arabic in the Oxygen Author view, but before I point my colleague in that and only that direction, I wanted to ask what others have used to edit Arabic TEI documents. When I've poked at Arabic in Oxygen before (older versions, though, so the following may no longer be the case), the Text view sometimes stranded angle brackets in the wrong place. The Author view obviously (no angle brackets) didn't do that, but it broke onto new lines in places that made sense from an engineering perspective, but that made the continuous text harder to read. All in all, it was usable, but should my colleague also be considering alternatives?

Thanks,

David
--
Turning Points in American History is on display at the MHS Monday through Saturday from 10 AM to 4 PM through 25 February. . More information is available at www.masshist.org.

Ondine LeBlanc, Director of Publications Massachusetts Historical Society
1154 Boylston Street, Boston, MA 02215
Phone: 617-646-0524, Fax: 617-859-0074
Email: [hidden email]
--
Turning Points in American History is on display at the MHS Monday through Saturday from 10 AM to 4 PM through 25 February. . More information is available at www.masshist.org.

Ondine LeBlanc, Director of Publications
Massachusetts Historical Society
1154 Boylston Street, Boston, MA 02215
Phone: 617-646-0524, Fax: 617-859-0074
Email: [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

no worries (was: Editing Arabic TEI )

Piotr Bański
Dear Ondine,

That notice was most probably added by your Exchange / Outlook server
upon receiving your own message redirected from TEI-L. It's Microsoft
that doubts you, not the TEI crowd. _Our_ faith in you does not
falter... :-)

Best wishes,

   Piotr


On 02/02/17 22:58, Ondine LeBlanc wrote:
> Okay, just need to add that I am not a fraud, really. This message is about XML and oXygen and Arabic characters. Really. Maybe the detector doesn't like the Turkish name?
>
> -----Original Message-----
> From: TEI (Text Encoding Initiative) public discussion list [mailto:[hidden email]] On Behalf Of Ondine LeBlanc
> Sent: Thursday, February 02, 2017 4:50 PM
> To: [hidden email]
> Subject: Re: Editing Arabic TEI
>
> [This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]

[...]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: no worries (was: Editing Arabic TEI )

Paterson, Duncan
Dear Ondine,

My own work is I'm cjk, but I have helped a few colleagues and friends with other non-latin scripts to set up oxygen for tei work.

The two basic problems I encounter most frequently are badly configured fonts. If the font configured in oxygen s preferences for  displaying XML is not a good match (or simply the default) you will see strange artifacts or weird lines, despite everything being fine with the underlying tei files.

The other problem relates to choosing utf8 or utf16 encoding, but I don't think double byte characters applies to Arabic.

Greetings
Duncan



Sent from my phone

> On 2 Feb 2017, at 23:36, Piotr Ba��ski <[hidden email]> wrote:
>
> Dear Ondine,
>
> That notice was most probably added by your Exchange / Outlook server upon receiving your own message redirected from TEI-L. It's Microsoft that doubts you, not the TEI crowd. _Our_ faith in you does not falter... :-)
>
> Best wishes,
>
>  Piotr
>
>
>> On 02/02/17 22:58, Ondine LeBlanc wrote:
>> Okay, just need to add that I am not a fraud, really. This message is about XML and oXygen and Arabic characters. Really. Maybe the detector doesn't like the Turkish name?
>>
>> -----Original Message-----
>> From: TEI (Text Encoding Initiative) public discussion list [mailto:[hidden email]] On Behalf Of Ondine LeBlanc
>> Sent: Thursday, February 02, 2017 4:50 PM
>> To: [hidden email]
>> Subject: Re: Editing Arabic TEI
>>
>> [This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
>
> [...]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Editing Arabic TEI

Frederik Elwert
In reply to this post by Birnbaum, David J
Dear David,

I helped a colleague set up Oxygen for editing an Arabic manuscript. It
works quite okay. Indeed, the Author view is more suited for substantial
work on the text, but I guess this is an inherent problem of XML for RTL
languages: The tags themselves run LTR, the text in between in the other
direction, so the cursor swaps direction all the time, making the
behaviour slightly unpredictable all the time.

Best,
Frederik



Am 02.02.2017 um 22:34 schrieb Birnbaum, David J:

> Dear TEI-L,
>
> I don't work with Arabic texts myself, but some of my students and
> colleagues do, and one has just asked me to recommend an XML editor. I use
> Oxygen for all of my own work, and I've watched the video that SyncroSoft
> produced about editing Arabic in the Oxygen Author view, but before I
> point my colleague in that and only that direction, I wanted to ask what
> others have used to edit Arabic TEI documents. When I've poked at Arabic
> in Oxygen before (older versions, though, so the following may no longer
> be the case), the Text view sometimes stranded angle brackets in the wrong
> place. The Author view obviously (no angle brackets) didn't do that, but
> it broke onto new lines in places that made sense from an engineering
> perspective, but that made the continuous text harder to read. All in all,
> it was usable, but should my colleague also be considering alternatives?
>
> Thanks,
>
> David
>

--
Dr. Frederik Elwert

Digital Humanities Coordinator
Center for Religious Studies
Ruhr-University Bochum

Universitätsstr. 90a
D-44780 Bochum

Phone +49(0)234 32-23024
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Editing Arabic TEI

Markus Schnoepf
At our project, Corpus Coranicum and some satellite projects (early arabic poetry, Paleocoran) we use oxygen XML for editing the comments. As well, we have developed a font for displaying the arabic text better than standard fonts do (arabic script is normally to small, when used in a mixed writing environment), called Coranica (http://corpuscoranicum.de/about/tools). [We are at the moment working for an english translation, at the moment only german, sorry]. As well, we want to prepare our common editing environment, ediarum (which is a ‚plugin‘ for oxygen, giving the editors a set of tags normally needed in digital editing) for the use of arabic and other rtl writing systems (see https://github.com/telota/ediarum).

 Best, Markus

> Am 03.02.2017 um 09:46 schrieb Frederik Elwert <[hidden email]>:
>
> Dear David,
>
> I helped a colleague set up Oxygen for editing an Arabic manuscript. It
> works quite okay. Indeed, the Author view is more suited for substantial
> work on the text, but I guess this is an inherent problem of XML for RTL
> languages: The tags themselves run LTR, the text in between in the other
> direction, so the cursor swaps direction all the time, making the
> behaviour slightly unpredictable all the time.
>
> Best,
> Frederik
>
>
>
> Am 02.02.2017 um 22:34 schrieb Birnbaum, David J:
>> Dear TEI-L,
>>
>> I don't work with Arabic texts myself, but some of my students and
>> colleagues do, and one has just asked me to recommend an XML editor. I use
>> Oxygen for all of my own work, and I've watched the video that SyncroSoft
>> produced about editing Arabic in the Oxygen Author view, but before I
>> point my colleague in that and only that direction, I wanted to ask what
>> others have used to edit Arabic TEI documents. When I've poked at Arabic
>> in Oxygen before (older versions, though, so the following may no longer
>> be the case), the Text view sometimes stranded angle brackets in the wrong
>> place. The Author view obviously (no angle brackets) didn't do that, but
>> it broke onto new lines in places that made sense from an engineering
>> perspective, but that made the continuous text harder to read. All in all,
>> it was usable, but should my colleague also be considering alternatives?
>>
>> Thanks,
>>
>> David
>>
>
> --
> Dr. Frederik Elwert
>
> Digital Humanities Coordinator
> Center for Religious Studies
> Ruhr-University Bochum
>
> Universitätsstr. 90a
> D-44780 Bochum
>
> Phone +49(0)234 32-23024
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Editing Arabic TEI

Radu Coravu
In reply to this post by Frederik Elwert
Hi,

As a developer working for Oxygen XML Editor I fully agree with
Frederik's analysis. The Author visual editing mode should be more
comfortable for RTL editing.

About David's remark:

> The Author view obviously (no angle brackets) didn't do that, but
> it broke onto new lines in places that made sense from an engineering
> perspective, but that made the continuous text harder to read.

If you have some sample TEI documents and give us some hints about what
does not work as expected we could try to improve the behavior in a
future version. Unfortunately we do not use RTL writing ourselves so
sometimes it's hard for us to understand what the expected editing
behaviors should be, that's why we need help with this.

Regards,
Radu

Radu Coravu
<oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

On 2/3/2017 10:46 AM, Frederik Elwert wrote:

> Dear David,
>
> I helped a colleague set up Oxygen for editing an Arabic manuscript. It
> works quite okay. Indeed, the Author view is more suited for substantial
> work on the text, but I guess this is an inherent problem of XML for RTL
> languages: The tags themselves run LTR, the text in between in the other
> direction, so the cursor swaps direction all the time, making the
> behaviour slightly unpredictable all the time.
>
> Best,
> Frederik
>
>
>
> Am 02.02.2017 um 22:34 schrieb Birnbaum, David J:
>> Dear TEI-L,
>>
>> I don't work with Arabic texts myself, but some of my students and
>> colleagues do, and one has just asked me to recommend an XML editor. I use
>> Oxygen for all of my own work, and I've watched the video that SyncroSoft
>> produced about editing Arabic in the Oxygen Author view, but before I
>> point my colleague in that and only that direction, I wanted to ask what
>> others have used to edit Arabic TEI documents. When I've poked at Arabic
>> in Oxygen before (older versions, though, so the following may no longer
>> be the case), the Text view sometimes stranded angle brackets in the wrong
>> place. The Author view obviously (no angle brackets) didn't do that, but
>> it broke onto new lines in places that made sense from an engineering
>> perspective, but that made the continuous text harder to read. All in all,
>> it was usable, but should my colleague also be considering alternatives?
>>
>> Thanks,
>>
>> David
>>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Editing Arabic TEI

Gioele Barabucci-2
Am 03.02.2017 um 09:53 schrieb Radu Coravu:
> If you have some sample TEI documents and give us some hints about what
> does not work as expected we could try to improve the behavior in a
> future version. Unfortunately we do not use RTL writing ourselves so
> sometimes it's hard for us to understand what the expected editing
> behaviors should be, that's why we need help with this.

Dear Radu, dear participants,

we of the Averroes project (Uni of Cologne, DARE, CCeH) [1] have plenty
of material I can send you to illustrate the "ergonomic" problems that
editors are facing when using oXygen to edit Arabic but also Hebrew
texts. I'll contact you privately.

Mostly it has to do with a clash of expectations between what happens
when letters are typed and how things appear on the screen, for example
when Latin characters (tags or punctuation marks) and Arabic characters
are on the same line.

A concrete example. Suppose that A, B and C are Arabic letters and | is
the cursor. If you type "<line>", then A, then B then C, you get the
following result (correct)

<line>|CBA

If, at that point, you type a period, you will get the incorrect

<line>|CBA.

instead of correct version

<line>|.CBA

I used "incorrect", but the behaviour is not really incorrect. As David
said, one can see the engineering reasons behind it, but the editors are
used to other word processing applications and the behaviour of oXygen
just feels wrong to them.

This is just an example. There are plenty of more complicated cases I
can illustrate. Solving them would improve the quality of life of the
editors and transcribers very much. ;)

Regards,

[1] http://averroes.uni-koeln.de/

--
Gioele Barabucci <[hidden email]>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: no worries

Gioele Barabucci-2
In reply to this post by Paterson, Duncan
Am 03.02.2017 um 00:39 schrieb Paterson, Duncan:
> The other problem relates to choosing utf8 or utf16 encoding, but I
> don't think double byte characters applies to Arabic.

Allow me a technical consideration and advice. UTF-16 is only a useless
historical artefact. It should not be used in any new project. The only
sane choices are UTF-8 and (if really needed) UCS-4.

UTF-16 is the worst of all encodings: it wastes bits like UCS-2/4, is
incompatible with ASCII and is as computationally hard to work with as
UTF-8.

One should use either UTF-8 for ASCII compatibility and space savings or
UCS-4 for speed of computation (but only under certain particular
circumstances).

Regards,

--
Gioele Barabucci <[hidden email]>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

XML character encoding (was "Re: [TEI-L] no worries")

Syd Bauman-10
The sentiment, if a bit overstated, is correct IMHO. Another
disadvantage is an XML file encoded in UTF-16 must begin with a byte
order mark (U+FEFF). Does the operating system handle that? Does the
XML editor? Do I?

The full name for UCS-4 in an XML declaration is "ISO-10646-UCS-4",
and this is one of the few places where XML is case insensitive. (So
a processor should recognize "ISO-10646-ucs-4" just as well.)

All that said, I don't know how to get my operating system to read &
write UCS-4 (or even UTF-16, not that I care), so I always use UTF-8.
:-|

> Allow me a technical consideration and advice. UTF-16 is only a
> useless historical artefact. It should not be used in any new
> project. The only sane choices are UTF-8 and (if really needed)
> UCS-4.
>
> UTF-16 is the worst of all encodings: it wastes bits like UCS-2/4,
> is incompatible with ASCII and is as computationally hard to work
> with as UTF-8.
>
> One should use either UTF-8 for ASCII compatibility and space
> savings or UCS-4 for speed of computation (but only under certain
> particular circumstances).
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Editing Arabic TEI

Birnbaum, David J
In reply to this post by Gioele Barabucci-2
Dear TEI-L,

Thanks very much to all who responded to my inquiry about editing Arabic
in <oXygen/>, and I have passed the information along to the colleague on
whose behalf I was asking.

Best,

David
__

On 2017-03-02, 4:14 AM, "TEI (Text Encoding Initiative) public discussion
list on behalf of Gioele Barabucci" <[hidden email] on behalf of
[hidden email]> wrote:

>Am 03.02.2017 um 09:53 schrieb Radu Coravu:
>> If you have some sample TEI documents and give us some hints about what
>> does not work as expected we could try to improve the behavior in a
>> future version. Unfortunately we do not use RTL writing ourselves so
>> sometimes it's hard for us to understand what the expected editing
>> behaviors should be, that's why we need help with this.
>
>Dear Radu, dear participants,
>
>we of the Averroes project (Uni of Cologne, DARE, CCeH) [1] have plenty
>of material I can send you to illustrate the "ergonomic" problems that
>editors are facing when using oXygen to edit Arabic but also Hebrew
>texts. I'll contact you privately.
>
>Mostly it has to do with a clash of expectations between what happens
>when letters are typed and how things appear on the screen, for example
>when Latin characters (tags or punctuation marks) and Arabic characters
>are on the same line.
>
>A concrete example. Suppose that A, B and C are Arabic letters and | is
>the cursor. If you type "<line>", then A, then B then C, you get the
>following result (correct)
>
><line>|CBA
>
>If, at that point, you type a period, you will get the incorrect
>
><line>|CBA.
>
>instead of correct version
>
><line>|.CBA
>
>I used "incorrect", but the behaviour is not really incorrect. As David
>said, one can see the engineering reasons behind it, but the editors are
>used to other word processing applications and the behaviour of oXygen
>just feels wrong to them.
>
>This is just an example. There are plenty of more complicated cases I
>can illustrate. Solving them would improve the quality of life of the
>editors and transcribers very much. ;)
>
>Regards,
>
>[1]
>https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Faverroes.u
>ni-koeln.de%2F&data=01%7C01%7Cdjbpitt%40PITT.EDU%7C0e01e5cbfd924718641908d
>44c151912%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1&sdata=d6R%2ByVXcLnMw1FvbA
>7vipjIsrQgq2cXbvvY%2BKxHq1%2BQ%3D&reserved=0
>
>--
>Gioele Barabucci <[hidden email]>
Loading...