mixed languages-language switching

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

mixed languages-language switching

Valentina Bella Lanza
Hi everybody,
My apologies if this question has come up before, but I am new in the community.
I'm currently working on encoding a medieval manuscript written in 3 languages. The content is the Song of Songs in the original language (hebrew) and its translation in aramaic and judeo-arabic.
The arbitrary switching between the three languages in the text often occurs in the same line, often without being highlighted.
My question is: which is the most common and logic method that you would use (or have already used) in order to tag the language switching in the text?
I'm not sure about <foreign> element used in the line when the language change, and after the declaration of the language of the paragraph as:
<p xml:lang="Hebr">
                  <lb n="1"/>התחלתי לכתוב שיר השירים בעדת אל אלים
                  <lb n="2"/> שירין ותושבחן די אמר שלמה<foreign  xml:lang="Armi"> שיר השירים אשר לשלמה            
                  <lb n="3"/> נביאה מלכא דישראל ברוח נבואה קדם רבון כל עלמא
                     <lb n="4"/>ה עשרתי שירתא</foreign>אתאמרו בעלמא הדין שירה דין

because, correct me if I'm wrong, it gives a sense of hierarchy between the languages. And I would use <foreign> to indicate a single foreign word within the paragraph instead (if I find a latin word in the hebrew section, for example).

Thank for your help
Kind regards,

Valentina B. Lanza

--
Ph.D. Student
Sapienza University of Rome

Reply | Threaded
Open this post in threaded view
|

Re: mixed languages-language switching

Gioele Barabucci-2
Am 09.03.2018 um 11:34 schrieb Valentina Bella Lanza:
> <p xml:lang="Hebr"> [...] <foreign xml:lang="Armi">

Hello Valentina,

a quick side note: according to BCP 47 (the standard for what one should
put inside a `xml:lang` attribute) `Hebr` is not a valid value.

You may want to use `he-Hebr` (Hebrew language written using the Hebrew
script) or simply `he` (Hebrew language written in any script).

Common valid structures for `xml:lang` are

* "lang" (`it`),
* "lang-Scrip" (`it-Latn`),
* "lang-Script-LOCALE" (`it-Latn-CH`),
* "lang-LOCALE" (`it-CH`).

For the Aramaic part, I suppose that the right tag should be `arc-Hebr`
(i.e. Aramaic written using the Hebrew script), is that right?

Regards,

--
Gioele Barabucci <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: mixed languages-language switching

Hayim Lapin
In reply to this post by Valentina Bella Lanza
Hi. Interesting project. When does the text date from? 
I would think that to convey the sense that the text is made up of three significant components, you might want to think of a structure that groups the three component languages together
<ab n="1"><!-- use canonical vs structure for @n? -->
    <seg xml:lang="he">... Hebrew ... </seg>
    <seg xml:lang="arc-??">... Aramaic... </seg>
    <seg xml:lang="ar-??">... Judaeo-Arabic... </seg>
</ab>
Or, if the relationship between the three language components is less regular something like linked abs or segs, as follows
<ab corresp="#point-to-verse-structure" xml:lang="he" xml:id="someID">  ... </ab>
<ab corresp="#someID" xml:lang="aram" >  ... </ab> 
<ab corresp= "#someID" "xml:lang="aram">  ... </ab>   





Robert H. Smith Professor of Jewish Studies and
Professor of History
Department of History
University of Maryland
2115 Francis Scott Key Hall
College Park, MD 20742
301 405 4296 | [hidden email]
www.digitalmishnah.org | www.eRabbinica.org

Director
Joseph and Rebecca Meyerhoff
Program and Center forJewish Studies
University of Maryland
4141 Susquehanna Hall
College Park, MD 20742
301 405 4975 | [hidden email]
www.jewishstudies.umd.edu


On Fri, Mar 9, 2018 at 5:34 AM, Valentina Bella Lanza <[hidden email]> wrote:
Hi everybody,
My apologies if this question has come up before, but I am new in the community.
I'm currently working on encoding a medieval manuscript written in 3 languages. The content is the Song of Songs in the original language (hebrew) and its translation in aramaic and judeo-arabic.
The arbitrary switching between the three languages in the text often occurs in the same line, often without being highlighted.
My question is: which is the most common and logic method that you would use (or have already used) in order to tag the language switching in the text?
I'm not sure about <foreign> element used in the line when the language change, and after the declaration of the language of the paragraph as:
<p xml:lang="Hebr">
                  <lb n="1"/>התחלתי לכתוב שיר השירים בעדת אל אלים
                  <lb n="2"/> שירין ותושבחן די אמר שלמה<foreign  xml:lang="Armi"> שיר השירים אשר לשלמה            
                  <lb n="3"/> נביאה מלכא דישראל ברוח נבואה קדם רבון כל עלמא
                     <lb n="4"/>ה עשרתי שירתא</foreign>אתאמרו בעלמא הדין שירה דין

because, correct me if I'm wrong, it gives a sense of hierarchy between the languages. And I would use <foreign> to indicate a single foreign word within the paragraph instead (if I find a latin word in the hebrew section, for example).

Thank for your help
Kind regards,

Valentina B. Lanza

--
Ph.D. Student
Sapienza University of Rome


Reply | Threaded
Open this post in threaded view
|

Re: mixed languages-language switching

Hugh Cayless-2
In reply to this post by Gioele Barabucci-2
I agree that <foreign> may be inappropriate here. I would probably just use something neutral, like <seg>, with an @xml:lang on it. I'd just add to Gioele's note that BCP 47 recommends *not* using the script subtag where there is a "default" script for that language, so "he-Hebr" is redundant and should be just "he" (see https://tools.ietf.org/html/bcp47#section-2.2.3 point 4).

All the best,
Hugh

On Fri, Mar 9, 2018 at 8:41 AM, Gioele Barabucci <[hidden email]> wrote:
Am 09.03.2018 um 11:34 schrieb Valentina Bella Lanza:
<p xml:lang="Hebr"> [...] <foreign xml:lang="Armi">

Hello Valentina,

a quick side note: according to BCP 47 (the standard for what one should put inside a `xml:lang` attribute) `Hebr` is not a valid value.

You may want to use `he-Hebr` (Hebrew language written using the Hebrew script) or simply `he` (Hebrew language written in any script).

Common valid structures for `xml:lang` are

* "lang" (`it`),
* "lang-Scrip" (`it-Latn`),
* "lang-Script-LOCALE" (`it-Latn-CH`),
* "lang-LOCALE" (`it-CH`).

For the Aramaic part, I suppose that the right tag should be `arc-Hebr` (i.e. Aramaic written using the Hebrew script), is that right?

Regards,

--
Gioele Barabucci <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: mixed languages-language switching

Valentina Bella Lanza
Thanks everyone for all the great advice.

Dear Professor Smith, the manuscript is from 14th-15th century.

To answer to Gioele Barabucci, I chose "jpa" for tagging Aramaic because, according to BCP 47, it is the corrispettive of Jewish Palestinian Aramaic.

Regards
Valentina B. Lanza


2018-03-09 14:53 GMT+01:00 Hugh Cayless <[hidden email]>:
I agree that <foreign> may be inappropriate here. I would probably just use something neutral, like <seg>, with an @xml:lang on it. I'd just add to Gioele's note that BCP 47 recommends *not* using the script subtag where there is a "default" script for that language, so "he-Hebr" is redundant and should be just "he" (see https://tools.ietf.org/html/bcp47#section-2.2.3 point 4).

All the best,
Hugh

On Fri, Mar 9, 2018 at 8:41 AM, Gioele Barabucci <[hidden email]> wrote:
Am 09.03.2018 um 11:34 schrieb Valentina Bella Lanza:
<p xml:lang="Hebr"> [...] <foreign xml:lang="Armi">

Hello Valentina,

a quick side note: according to BCP 47 (the standard for what one should put inside a `xml:lang` attribute) `Hebr` is not a valid value.

You may want to use `he-Hebr` (Hebrew language written using the Hebrew script) or simply `he` (Hebrew language written in any script).

Common valid structures for `xml:lang` are

* "lang" (`it`),
* "lang-Scrip" (`it-Latn`),
* "lang-Script-LOCALE" (`it-Latn-CH`),
* "lang-LOCALE" (`it-CH`).

For the Aramaic part, I suppose that the right tag should be `arc-Hebr` (i.e. Aramaic written using the Hebrew script), is that right?

Regards,

--
Gioele Barabucci <[hidden email]>




--
Ph.D. Student
Sapienza University of Rome
Reply | Threaded
Open this post in threaded view
|

Re: mixed languages-language switching

Mylonas, Elli
agreed on <seg>

in IIP, in an inscription that is primarily in one language but has a single word from another language, we use <foreign>

but in the case of languages that are equally balanced, it should be <seg> Not sure how many of those we have encountered. 

  --elli



On Thu, Mar 15, 2018 at 12:10 PM, Valentina Bella Lanza <[hidden email]> wrote:
Thanks everyone for all the great advice.

Dear Professor Smith, the manuscript is from 14th-15th century.

To answer to Gioele Barabucci, I chose "jpa" for tagging Aramaic because, according to BCP 47, it is the corrispettive of Jewish Palestinian Aramaic.

Regards
Valentina B. Lanza


2018-03-09 14:53 GMT+01:00 Hugh Cayless <[hidden email]>:
I agree that <foreign> may be inappropriate here. I would probably just use something neutral, like <seg>, with an @xml:lang on it. I'd just add to Gioele's note that BCP 47 recommends *not* using the script subtag where there is a "default" script for that language, so "he-Hebr" is redundant and should be just "he" (see https://tools.ietf.org/html/bcp47#section-2.2.3 point 4).

All the best,
Hugh

On Fri, Mar 9, 2018 at 8:41 AM, Gioele Barabucci <[hidden email]> wrote:
Am 09.03.2018 um 11:34 schrieb Valentina Bella Lanza:
<p xml:lang="Hebr"> [...] <foreign xml:lang="Armi">

Hello Valentina,

a quick side note: according to BCP 47 (the standard for what one should put inside a `xml:lang` attribute) `Hebr` is not a valid value.

You may want to use `he-Hebr` (Hebrew language written using the Hebrew script) or simply `he` (Hebrew language written in any script).

Common valid structures for `xml:lang` are

* "lang" (`it`),
* "lang-Scrip" (`it-Latn`),
* "lang-Script-LOCALE" (`it-Latn-CH`),
* "lang-LOCALE" (`it-CH`).

For the Aramaic part, I suppose that the right tag should be `arc-Hebr` (i.e. Aramaic written using the Hebrew script), is that right?

Regards,

--
Gioele Barabucci <[hidden email]>




--
Ph.D. Student
Sapienza University of Rome