encoding foreign names in original script & transcription

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

encoding foreign names in original script & transcription

Carlson, Thomas Andrew

Dear colleagues,


I am looking for ways to encode names in various languages (Arabic, Armenian, etc.) in their original scripts as well as in transliteration, making sure the two strings are related in the code.  I can think of two ways to do this:


1. Using <choice> within <placeName>:

<placeName>

    <choice><seg xml:lang="ar">الموصل</seg><seg type="transliteration">al-Mawil</seg></choice>

</placeName>


I was tempted to use <orig>, but the transliteration is certainly not "corrected" and not really "regularized," so <corr> and <reg> do not make sense.  So I went to <seg> instead.


2. Using parallel corresponding <placeName>s:

<placeName xml:id="name1" type="original" corresp="#name2">الموصل</placeName>

<placeName xml:id="name2" type="transliteration" corresp="#name1">al-Mawil</placeName>


Are there other possibilities?  Is anyone facing a similar issue, and if so, how have you resolved it?


I will happily receive any input!


Warm wishes,

Thomas.


+-+-+-+-+-+-+-+-+-+-+
Thomas A. Carlson, Ph.D.
Assistant Professor of Middle Eastern History
History Department
Oklahoma State University
101 South Murray Hall
Stillwater, OK 74078-3054
+-+-+-+-+-+-+-+-+-+-+
[hidden email]
Twitter: @MedievalMidEast
+-+-+-+-+-+-+-+-+-+-+
OSU interprets every email sent or received from this email address as a record of the State of Oklahoma and guarantees no privacy for such communications.  If you do not wish your emails to become state records, ask me for my alternate email address.
+-+-+-+-+-+-+-+-+-+-+
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: encoding foreign names in original script & transcription

Eduard Drenth

Hi,


A few thoughts on this:


1) seg is very broad / generic: a sort of fallback in this case

2) isn't transliteration more a dictionary thing?

3) choice can also be used inside placeName

4) reg isn't so bad I think, maybe <reg xml:lang="en">...</reg>

5) there is also <alt> to offer stand-off alternatives


Eduard Drenth, Software Architekt


[hidden email]


Doelestrjitte 8

8911 DX  Ljouwert

+31 58 234 30 47

+31 62 094 34 28 (privé)


gpg: https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43




From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Carlson, Thomas Andrew <[hidden email]>
Sent: Wednesday, June 21, 2017 2:19 AM
To: [hidden email]
Subject: encoding foreign names in original script & transcription
 

Dear colleagues,


I am looking for ways to encode names in various languages (Arabic, Armenian, etc.) in their original scripts as well as in transliteration, making sure the two strings are related in the code.  I can think of two ways to do this:


1. Using <choice> within <placeName>:

<placeName>

    <choice><seg xml:lang="ar">الموصل</seg><seg type="transliteration">al-Mawil</seg></choice>

</placeName>


I was tempted to use <orig>, but the transliteration is certainly not "corrected" and not really "regularized," so <corr> and <reg> do not make sense.  So I went to <seg> instead.


2. Using parallel corresponding <placeName>s:

<placeName xml:id="name1" type="original" corresp="#name2">الموصل</placeName>

<placeName xml:id="name2" type="transliteration" corresp="#name1">al-Mawil</placeName>


Are there other possibilities?  Is anyone facing a similar issue, and if so, how have you resolved it?


I will happily receive any input!


Warm wishes,

Thomas.


+-+-+-+-+-+-+-+-+-+-+
Thomas A. Carlson, Ph.D.
Assistant Professor of Middle Eastern History
History Department
Oklahoma State University
101 South Murray Hall
Stillwater, OK 74078-3054
+-+-+-+-+-+-+-+-+-+-+
[hidden email]
Twitter: @MedievalMidEast
+-+-+-+-+-+-+-+-+-+-+
OSU interprets every email sent or received from this email address as a record of the State of Oklahoma and guarantees no privacy for such communications.  If you do not wish your emails to become state records, ask me for my alternate email address.
+-+-+-+-+-+-+-+-+-+-+
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: encoding foreign names in original script & transcription

Torsten Schassan-2
Hi Thomas,

isn't this a perfect example for the use of authority data? E.g.

<placeName ref="#entry">al-Maw?il</placeName>

and somewhere else (inside <sourceDesc>):

<place xml:id="entry">
   <placeName xml:lang="xy" type="original">...</placeName>
   <placeName xml:lang="yz" type="transliteration">al-Maw?il</placeName>
   <idno type="tgn">ID-No</idno>
   ...
</place>

Best, Torsten


Am 21.06.2017 um 07:04 schrieb Eduard Drenth:

> Hi,
>
>
> A few thoughts on this:
>
>
> 1) seg is very broad / generic: a sort of fallback in this case
>
> 2) isn't transliteration more a dictionary thing?
>
> 3) choice can also be used inside placeName
>
> 4) reg isn't so bad I think, maybe <reg xml:lang="en">...</reg>
>
> 5) there is also <alt> to offer stand-off alternatives
>
>
> Eduard Drenth, Software Architekt
>
>
> [hidden email]
>
>
> Doelestrjitte 8
>
> 8911 DX  Ljouwert
>
> +31 58 234 30 47
>
> +31 62 094 34 28 (privé)
>
>
> gpg: https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43
>
>
> ________________________________
> From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Carlson, Thomas Andrew <[hidden email]>
> Sent: Wednesday, June 21, 2017 2:19 AM
> To: [hidden email]
> Subject: encoding foreign names in original script & transcription
>
>
> Dear colleagues,
>
>
> I am looking for ways to encode names in various languages (Arabic, Armenian, etc.) in their original scripts as well as in transliteration, making sure the two strings are related in the code.  I can think of two ways to do this:
>
>
> 1. Using <choice> within <placeName>:
>
> <placeName>
>
>      <choice><seg xml:lang="ar">??????</seg><seg type="transliteration">al-Maw?il</seg></choice>
>
> </placeName>
>
>
> I was tempted to use <orig>, but the transliteration is certainly not "corrected" and not really "regularized," so <corr> and <reg> do not make sense.  So I went to <seg> instead.
>
>
> 2. Using parallel corresponding <placeName>s:
>
> <placeName xml:id="name1" type="original" corresp="#name2">??????</placeName>
>
> <placeName xml:id="name2" type="transliteration" corresp="#name1">al-Maw?il</placeName>
>
>
> Are there other possibilities?  Is anyone facing a similar issue, and if so, how have you resolved it?
>
>
> I will happily receive any input!
>
>
> Warm wishes,
>
> Thomas.
>
>
> +-+-+-+-+-+-+-+-+-+-+
> Thomas A. Carlson, Ph.D.
> Assistant Professor of Middle Eastern History
> History Department
> Oklahoma State University
> 101 South Murray Hall
> Stillwater, OK 74078-3054
> +-+-+-+-+-+-+-+-+-+-+
> [hidden email]
> http://www.thomasacarlson.com/
> Twitter: @MedievalMidEast
> +-+-+-+-+-+-+-+-+-+-+
> OSU interprets every email sent or received from this email address as a record of the State of Oklahoma and guarantees no privacy for such communications.  If you do not wish your emails to become state records, ask me for my alternate email address.
> +-+-+-+-+-+-+-+-+-+-+
>


--
Torsten Schassan - Digitale Editionen, Abteilung Handschriften und
Sondersammlungen
Herzog August Bibliothek, Postfach 1364, D-38299 Wolfenbuettel, Tel.:
+49-5331-808-130 (Fax -165)
Handschriftendatenbank* http://diglib.hab.de/?db=mss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: encoding foreign names in original script & transcription

Paterson, Duncan
In reply to this post by Carlson, Thomas Andrew
Hello Thomas, 

i ll second Torsten’s recommendation you can simply place an xml:lang attribute to the placeName in the main body of the text, and use a pointer or id to elsewhere in the document where all placeNames are defined in the main language of the tei doc

Greetings
Duncan

On 21. Jun 2017, at 02:19, Carlson, Thomas Andrew <[hidden email]> wrote:

Dear colleagues,

I am looking for ways to encode names in various languages (Arabic, Armenian, etc.) in their original scripts as well as in transliteration, making sure the two strings are related in the code.  I can think of two ways to do this:

1. Using <choice> within <placeName>:
<placeName>
    <choice><seg xml:lang="ar">الموصل</seg><seg type="transliteration">al-Mawil</seg></choice>
</placeName>

I was tempted to use <orig>, but the transliteration is certainly not "corrected" and not really "regularized," so <corr> and <reg> do not make sense.  So I went to <seg> instead.

2. Using parallel corresponding <placeName>s:
<placeName xml:id="name1" type="original" corresp="#name2">الموصل</placeName>
<placeName xml:id="name2" type="transliteration" corresp="#name1">al-Mawil</placeName>

Are there other possibilities?  Is anyone facing a similar issue, and if so, how have you resolved it?

I will happily receive any input!

Warm wishes,
Thomas.

+-+-+-+-+-+-+-+-+-+-+
Thomas A. Carlson, Ph.D.
Assistant Professor of Middle Eastern History
History Department
Oklahoma State University
101 South Murray Hall
Stillwater, OK 74078-3054
+-+-+-+-+-+-+-+-+-+-+
[hidden email]
Twitter: @MedievalMidEast
+-+-+-+-+-+-+-+-+-+-+
OSU interprets every email sent or received from this email address as a record of the State of Oklahoma and guarantees no privacy for such communications.  If you do not wish your emails to become state records, ask me for my alternate email address.
+-+-+-+-+-+-+-+-+-+-+

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: encoding foreign names in original script & transcription

Eduard Drenth

One other thing I would like to mention is that using @type (and other semantically unclear encodings) isn't a very strict way of encoding. So semantic comparisons and also processing material will be harder and less generic.


Eduard Drenth, Software Architekt


[hidden email]


Doelestrjitte 8

8911 DX  Ljouwert

+31 58 234 30 47

+31 62 094 34 28 (privé)


gpg: https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43




From: TEI (Text Encoding Initiative) public discussion list <[hidden email]> on behalf of Paterson, Duncan <[hidden email]>
Sent: Wednesday, June 21, 2017 8:59 AM
To: [hidden email]
Subject: Re: encoding foreign names in original script & transcription
 
Hello Thomas, 

i ll second Torsten’s recommendation you can simply place an xml:lang attribute to the placeName in the main body of the text, and use a pointer or id to elsewhere in the document where all placeNames are defined in the main language of the tei doc

Greetings
Duncan

On 21. Jun 2017, at 02:19, Carlson, Thomas Andrew <[hidden email]> wrote:

Dear colleagues,

I am looking for ways to encode names in various languages (Arabic, Armenian, etc.) in their original scripts as well as in transliteration, making sure the two strings are related in the code.  I can think of two ways to do this:

1. Using <choice> within <placeName>:
<placeName>
    <choice><seg xml:lang="ar">الموصل</seg><seg type="transliteration">al-Mawil</seg></choice>
</placeName>

I was tempted to use <orig>, but the transliteration is certainly not "corrected" and not really "regularized," so <corr> and <reg> do not make sense.  So I went to <seg> instead.

2. Using parallel corresponding <placeName>s:
<placeName xml:id="name1" type="original" corresp="#name2">الموصل</placeName>
<placeName xml:id="name2" type="transliteration" corresp="#name1">al-Mawil</placeName>

Are there other possibilities?  Is anyone facing a similar issue, and if so, how have you resolved it?

I will happily receive any input!

Warm wishes,
Thomas.

+-+-+-+-+-+-+-+-+-+-+
Thomas A. Carlson, Ph.D.
Assistant Professor of Middle Eastern History
History Department
Oklahoma State University
101 South Murray Hall
Stillwater, OK 74078-3054
+-+-+-+-+-+-+-+-+-+-+
[hidden email]
Twitter: @MedievalMidEast
+-+-+-+-+-+-+-+-+-+-+
OSU interprets every email sent or received from this email address as a record of the State of Oklahoma and guarantees no privacy for such communications.  If you do not wish your emails to become state records, ask me for my alternate email address.
+-+-+-+-+-+-+-+-+-+-+

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: encoding foreign names in original script & transcription

Lou Burnard-6
In reply to this post by Carlson, Thomas Andrew

I have two fairly obvious comments on this.

Firstly, a transliteration is not the same as a translation. If all you are doing is representing an Arabic word (placename or anything else) using Latin characters rather than Arabic, you should indicate that by using the appropriate language code e.g. xml:lang="ar-Latn". (see further the helpful guide at https://www.w3.org/International/questions/qa-choosing-language-tags.en)

Secondly, if the transliteration isn't there in your source text but something you are supplying, you really must wrap it up in a <choice> or a <note> or something, to indicate that this is the case. Otherwise, how will we know?

On the question as to whether you have to use a <seg> rather than a <reg> I am agnostic. There are doubtless some for whom using the notion of regularisation here would be inappropriate, because of the implication that the Arabic script is in some sense irregular! OTOH, if the bulk of your text is not in Arabic, this seems comparatively venial.

On the use of @type to indicate that this is a transliteration, as I suggested above this is redundant if you use the proper language tag. Some simple-minded processors may however prefer you to make this more explicit, and a @type attribute value is the way to do this. If you define the range of possible values for @type in your ODD, this  would also improve processability.



On 21/06/17 01:19, Carlson, Thomas Andrew wrote:
Dear colleagues,


I am looking for ways to encode names in various languages (Arabic, Armenian, etc.) in their original scripts as well as in transliteration, making sure the two strings are related in the code.  I can think of two ways to do this:


1. Using <choice> within <placeName>:

<placeName>

    <choice><seg xml:lang="ar">الموصل</seg><seg type="transliteration">al-Mawṣil</seg></choice>

</placeName>


I was tempted to use <orig>, but the transliteration is certainly not "corrected" and not really "regularized," so <corr> and <reg> do not make sense.  So I went to <seg> instead.


2. Using parallel corresponding <placeName>s:

<placeName xml:id="name1" type="original" corresp="#name2">الموصل</placeName>

<placeName xml:id="name2" type="transliteration" corresp="#name1">al-Mawṣil</placeName>


Are there other possibilities?  Is anyone facing a similar issue, and if so, how have you resolved it?


I will happily receive any input!


Warm wishes,

Thomas.


+-+-+-+-+-+-+-+-+-+-+
Thomas A. Carlson, Ph.D.
Assistant Professor of Middle Eastern History
History Department
Oklahoma State University
101 South Murray Hall
Stillwater, OK 74078-3054
+-+-+-+-+-+-+-+-+-+-+
[hidden email]
http://www.thomasacarlson.com/
Twitter: @MedievalMidEast
+-+-+-+-+-+-+-+-+-+-+
OSU interprets every email sent or received from this email address as a record of the State of Oklahoma and guarantees no privacy for such communications.  If you do not wish your emails to become state records, ask me for my alternate email address.
+-+-+-+-+-+-+-+-+-+-+

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: encoding foreign names in original script & transcription

Martin Holmes
In reply to this post by Carlson, Thomas Andrew
Hi there,

I think for the transliteration I would use:

<seg xml:lang="ar-Latn">

meaning that this is Arabic in latin script.

Cheers,
Martin

On 2017-06-20 05:19 PM, Carlson, Thomas Andrew wrote:

> Dear colleagues,
>
>
> I am looking for ways to encode names in various languages (Arabic,
> Armenian, etc.) in their original scripts as well as in transliteration,
> making sure the two strings are related in the code.  I can think of two
> ways to do this:
>
>
> 1. Using <choice> within <placeName>:
>
> <placeName>
>
>      <choice><seg xml:lang="ar">الموصل</seg><seg
> type="transliteration">al-Mawṣil</seg></choice>
>
> </placeName>
>
>
> I was tempted to use <orig>, but the transliteration is certainly not
> "corrected" and not really "regularized," so <corr> and <reg> do not
> make sense.  So I went to <seg> instead.
>
>
> 2. Using parallel corresponding <placeName>s:
>
> <placeName xml:id="name1" type="original"
> corresp="#name2">الموصل</placeName>
>
> <placeName xml:id="name2" type="transliteration"
> corresp="#name1">al-Mawṣil</placeName>
>
>
> Are there other possibilities?  Is anyone facing a similar issue, and if
> so, how have you resolved it?
>
>
> I will happily receive any input!
>
>
> Warm wishes,
>
> Thomas.
>
>
> +-+-+-+-+-+-+-+-+-+-+
> Thomas A. Carlson, Ph.D.
> Assistant Professor of Middle Eastern History
> History Department
> Oklahoma State University
> 101 South Murray Hall
> Stillwater, OK 74078-3054
> +-+-+-+-+-+-+-+-+-+-+
> [hidden email]
> http://www.thomasacarlson.com/
> Twitter: @MedievalMidEast
> +-+-+-+-+-+-+-+-+-+-+
> /OSU interprets every email sent or received from this email address as
> a record of the State of Oklahoma and guarantees no privacy for such
> communications.  If you do not wish your emails to become state records,
> ask me for my alternate email address./
> +-+-+-+-+-+-+-+-+-+-+
Loading...