Re: TEI-L Digest - 2 Feb 2017 to 3 Feb 2017 (#2017-28)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: TEI-L Digest - 2 Feb 2017 to 3 Feb 2017 (#2017-28)

Kathryn Tomasek-2
Re: Piotr's friendly nudge about TEI-MM minutes from Vienna.

The fact that the minutes have not been posted is my fault. There was a death in my family in the fall, and I have fallen terribly behind in all areas of business.

With sincere apologies,

Kathryn Tomasek

> On Feb 4, 2017, at 12:00 AM, TEI-L automatic digest system <[hidden email]> wrote:
>
> There are 16 messages totaling 3402 lines in this issue.
>
> Topics of the day:
>
>  1. Editing Arabic TEI (5)
>  2. visiting fellows, Maynooth University
>  3. no worries
>  4. locus from line 10 of folio 1r to line 5 of folio 3v (2)
>  5. failing to locate the minutes of recent Board meetings (2)
>  6. RIDE 5 out!
>  7. XML character encoding (was "Re: [TEI-L] no worries")
>  8. standardizing linguistic encoding (3)
>
> ----------------------------------------------------------------------
>
> Date:    Fri, 3 Feb 2017 09:46:06 +0100
> From:    Frederik Elwert <[hidden email]>
> Subject: Re: Editing Arabic TEI
>
> Dear David,
>
> I helped a colleague set up Oxygen for editing an Arabic manuscript. It
> works quite okay. Indeed, the Author view is more suited for substantial
> work on the text, but I guess this is an inherent problem of XML for RTL
> languages: The tags themselves run LTR, the text in between in the other
> direction, so the cursor swaps direction all the time, making the
> behaviour slightly unpredictable all the time.
>
> Best,
> Frederik
>
>
>
>> Am 02.02.2017 um 22:34 schrieb Birnbaum, David J:
>> Dear TEI-L,
>>
>> I don't work with Arabic texts myself, but some of my students and
>> colleagues do, and one has just asked me to recommend an XML editor. I use
>> Oxygen for all of my own work, and I've watched the video that SyncroSoft
>> produced about editing Arabic in the Oxygen Author view, but before I
>> point my colleague in that and only that direction, I wanted to ask what
>> others have used to edit Arabic TEI documents. When I've poked at Arabic
>> in Oxygen before (older versions, though, so the following may no longer
>> be the case), the Text view sometimes stranded angle brackets in the wrong
>> place. The Author view obviously (no angle brackets) didn't do that, but
>> it broke onto new lines in places that made sense from an engineering
>> perspective, but that made the continuous text harder to read. All in all,
>> it was usable, but should my colleague also be considering alternatives?
>>
>> Thanks,
>>
>> David
>>
>
> --
> Dr. Frederik Elwert
>
> Digital Humanities Coordinator
> Center for Religious Studies
> Ruhr-University Bochum
>
> Universitätsstr. 90a
> D-44780 Bochum
>
> Phone +49(0)234 32-23024
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 09:52:16 +0100
> From:    Markus Schnöpf <[hidden email]>
> Subject: Re: Editing Arabic TEI
>
> At our project, Corpus Coranicum and some satellite projects (early arabic poetry, Paleocoran) we use oxygen XML for editing the comments. As well, we have developed a font for displaying the arabic text better than standard fonts do (arabic script is normally to small, when used in a mixed writing environment), called Coranica (http://corpuscoranicum.de/about/tools). [We are at the moment working for an english translation, at the moment only german, sorry]. As well, we want to prepare our common editing environment, ediarum (which is a ‚plugin‘ for oxygen, giving the editors a set of tags normally needed in digital editing) for the use of arabic and other rtl writing systems (see https://github.com/telota/ediarum).
>
> Best, Markus
>
>> Am 03.02.2017 um 09:46 schrieb Frederik Elwert <[hidden email]>:
>>
>> Dear David,
>>
>> I helped a colleague set up Oxygen for editing an Arabic manuscript. It
>> works quite okay. Indeed, the Author view is more suited for substantial
>> work on the text, but I guess this is an inherent problem of XML for RTL
>> languages: The tags themselves run LTR, the text in between in the other
>> direction, so the cursor swaps direction all the time, making the
>> behaviour slightly unpredictable all the time.
>>
>> Best,
>> Frederik
>>
>>
>>
>>> Am 02.02.2017 um 22:34 schrieb Birnbaum, David J:
>>> Dear TEI-L,
>>>
>>> I don't work with Arabic texts myself, but some of my students and
>>> colleagues do, and one has just asked me to recommend an XML editor. I use
>>> Oxygen for all of my own work, and I've watched the video that SyncroSoft
>>> produced about editing Arabic in the Oxygen Author view, but before I
>>> point my colleague in that and only that direction, I wanted to ask what
>>> others have used to edit Arabic TEI documents. When I've poked at Arabic
>>> in Oxygen before (older versions, though, so the following may no longer
>>> be the case), the Text view sometimes stranded angle brackets in the wrong
>>> place. The Author view obviously (no angle brackets) didn't do that, but
>>> it broke onto new lines in places that made sense from an engineering
>>> perspective, but that made the continuous text harder to read. All in all,
>>> it was usable, but should my colleague also be considering alternatives?
>>>
>>> Thanks,
>>>
>>> David
>>>
>>
>> --
>> Dr. Frederik Elwert
>>
>> Digital Humanities Coordinator
>> Center for Religious Studies
>> Ruhr-University Bochum
>>
>> Universitätsstr. 90a
>> D-44780 Bochum
>>
>> Phone +49(0)234 32-23024
>>
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 10:53:07 +0200
> From:    Radu Coravu <[hidden email]>
> Subject: Re: Editing Arabic TEI
>
> Hi,
>
> As a developer working for Oxygen XML Editor I fully agree with
> Frederik's analysis. The Author visual editing mode should be more
> comfortable for RTL editing.
>
> About David's remark:
>
>> The Author view obviously (no angle brackets) didn't do that, but
>> it broke onto new lines in places that made sense from an engineering
>> perspective, but that made the continuous text harder to read.
>
> If you have some sample TEI documents and give us some hints about what
> does not work as expected we could try to improve the behavior in a
> future version. Unfortunately we do not use RTL writing ourselves so
> sometimes it's hard for us to understand what the expected editing
> behaviors should be, that's why we need help with this.
>
> Regards,
> Radu
>
> Radu Coravu
> <oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
> http://www.oxygenxml.com
>
>> On 2/3/2017 10:46 AM, Frederik Elwert wrote:
>> Dear David,
>>
>> I helped a colleague set up Oxygen for editing an Arabic manuscript. It
>> works quite okay. Indeed, the Author view is more suited for substantial
>> work on the text, but I guess this is an inherent problem of XML for RTL
>> languages: The tags themselves run LTR, the text in between in the other
>> direction, so the cursor swaps direction all the time, making the
>> behaviour slightly unpredictable all the time.
>>
>> Best,
>> Frederik
>>
>>
>>
>>> Am 02.02.2017 um 22:34 schrieb Birnbaum, David J:
>>> Dear TEI-L,
>>>
>>> I don't work with Arabic texts myself, but some of my students and
>>> colleagues do, and one has just asked me to recommend an XML editor. I use
>>> Oxygen for all of my own work, and I've watched the video that SyncroSoft
>>> produced about editing Arabic in the Oxygen Author view, but before I
>>> point my colleague in that and only that direction, I wanted to ask what
>>> others have used to edit Arabic TEI documents. When I've poked at Arabic
>>> in Oxygen before (older versions, though, so the following may no longer
>>> be the case), the Text view sometimes stranded angle brackets in the wrong
>>> place. The Author view obviously (no angle brackets) didn't do that, but
>>> it broke onto new lines in places that made sense from an engineering
>>> perspective, but that made the continuous text harder to read. All in all,
>>> it was usable, but should my colleague also be considering alternatives?
>>>
>>> Thanks,
>>>
>>> David
>>>
>>
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 10:14:27 +0100
> From:    Gioele Barabucci <[hidden email]>
> Subject: Re: Editing Arabic TEI
>
>> Am 03.02.2017 um 09:53 schrieb Radu Coravu:
>> If you have some sample TEI documents and give us some hints about what
>> does not work as expected we could try to improve the behavior in a
>> future version. Unfortunately we do not use RTL writing ourselves so
>> sometimes it's hard for us to understand what the expected editing
>> behaviors should be, that's why we need help with this.
>
> Dear Radu, dear participants,
>
> we of the Averroes project (Uni of Cologne, DARE, CCeH) [1] have plenty
> of material I can send you to illustrate the "ergonomic" problems that
> editors are facing when using oXygen to edit Arabic but also Hebrew
> texts. I'll contact you privately.
>
> Mostly it has to do with a clash of expectations between what happens
> when letters are typed and how things appear on the screen, for example
> when Latin characters (tags or punctuation marks) and Arabic characters
> are on the same line.
>
> A concrete example. Suppose that A, B and C are Arabic letters and | is
> the cursor. If you type "<line>", then A, then B then C, you get the
> following result (correct)
>
> <line>|CBA
>
> If, at that point, you type a period, you will get the incorrect
>
> <line>|CBA.
>
> instead of correct version
>
> <line>|.CBA
>
> I used "incorrect", but the behaviour is not really incorrect. As David
> said, one can see the engineering reasons behind it, but the editors are
> used to other word processing applications and the behaviour of oXygen
> just feels wrong to them.
>
> This is just an example. There are plenty of more complicated cases I
> can illustrate. Solving them would improve the quality of life of the
> editors and transcribers very much. ;)
>
> Regards,
>
> [1] http://averroes.uni-koeln.de/
>
> --
> Gioele Barabucci <[hidden email]>
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 09:18:34 +0000
> From:    Susan Schreibman <[hidden email]>
> Subject: visiting fellows, Maynooth University
>
> *Visiting Fellowship Scheme*
>
> *Faculty of Arts, Celtic Studies and Philosophy*
>
> *Maynooth University*
>
> *Academic Year 2017-18*
>
> **
>
> The Faculty of Arts, Celtic Studies and Philosophy and /An Foras Feasa
> /Research Institute at Maynooth University are pleased to announce the
> call for applications for the Visiting Fellowship Scheme in the
> Humanities for the academic year 2017-18. We are especially interested
> in applications from colleagues in the TEI community.
>
> The duration of the visiting fellowship is envisaged as ordinarily
> between one and six months; applications for a shorter or longer
> duration will be considered. Preference will be given to Fellows whose
> residence coincides when students are in term and Fellows from outside
> Ireland. Only in exceptional circumstances will Fellowships be awarded
> to researchers normally resident in Ireland.
>
> Fellows will receive office space and office facilities from /An Foras
> Feasa /in the Iontas Building, a state-of-the-art humanities research
> institute, along with full library access and computer facilities. There
> is a robust and welcoming research culture at Maynooth University and
> Fellows will be facilitated in achieving their research goals while in
> residence.
>
> Fellows will be asked to provide one seminar or a workshop to
> postgraduate students in the Researcher’s field of interest, as well as
> a guest lecture to the University community. A limited number of travel
> stipends of €500 will be available; preference will be given to
> applicants with limited institutional funding.
>
> The current call will close on *31 March 2017*. Thereafter applications
> will be considered on a rolling basis. To apply, please complete the
> form  available here:
> https://www.maynoothuniversity.ie/foras-feasa/visiting-fellowships
>
> and return it to [hidden email] <mailto:[hidden email]>.
>
> For informal queries, please contact Professor Susan Schreibman,
> Director of An Foras Feasa ([hidden email])
>
> Note: The fellowship does not include accommodation. However, short-stay
> accommodation may be booked through Maynooth Campus Conference and
> Accommodation (see http://www.maynoothcampus.com) at very reasonable
> rates. Alternatively, for longer stays, a variety of accommodation is
> available in the Maynooth vicinity.
>
>
> --
> Susan Schreibman
> Professor of Digital Humanities
> Director of An Foras Feasa
> Iontas Building
> Maynooth University
> Maynooth, Co. Kildare
>
> email: [hidden email]
> phone: +353 1 708 3451
> fax:  +353 1 708 4797
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 10:26:13 +0100
> From:    Gioele Barabucci <[hidden email]>
> Subject: Re: no worries
>
>> Am 03.02.2017 um 00:39 schrieb Paterson, Duncan:
>> The other problem relates to choosing utf8 or utf16 encoding, but I
>> don't think double byte characters applies to Arabic.
>
> Allow me a technical consideration and advice. UTF-16 is only a useless
> historical artefact. It should not be used in any new project. The only
> sane choices are UTF-8 and (if really needed) UCS-4.
>
> UTF-16 is the worst of all encodings: it wastes bits like UCS-2/4, is
> incompatible with ASCII and is as computationally hard to work with as
> UTF-8.
>
> One should use either UTF-8 for ASCII compatibility and space savings or
> UCS-4 for speed of computation (but only under certain particular
> circumstances).
>
> Regards,
>
> --
> Gioele Barabucci <[hidden email]>
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 10:33:13 +0100
> From:    Torsten Schassan <[hidden email]>
> Subject: Re: locus from line 10 of folio 1r to line 5 of folio 3v
>
> Dear Pietro,
>
> the attribute @n doesn't have the semantic you need: It is just a number
> but not necessarily (and not predictable!) the one of a line.
>
> A solution I can think of would be to define the line either
>
> - as textual line as part of the "transcription" somewhere in <text> or
> <sourceDoc> and refer to that it from your <locus>, or
>
> - as zone in some graphical element (surface, graphic) and point to that
> using @facs.
>
> I don't think there's a defined way of including this information in
> @from/@to of the locus element, although it should be possible to think
> of the line as a fragment of the page, and thus refer to that fragment
> using the XPointer way: uri#fragmentIdentifier, here e.g. 1r#l10. Still,
> this way would require to identify the fragment somewhere by an @xml:id.
>
>
> Best, Torsten
>
>
>> Am 02.02.2017 um 23:06 schrieb Pietro Liuzzo:
>> Dear all,
>>
>> what is the best way to define the location of a text which starts at a
>> certain line of a folio and ends at a certain line of another like from
>> line 10 of folio 1r to line 5 of folio 3v
>>
>> <locus from="1r" n="10"/> <locus to="3v" n="5"/> for example?
>>
>> thanks a lot!
>> Pietro
>>
>
>
> --
> Torsten Schassan - Digitale Editionen, Abteilung Handschriften und
> Sondersammlungen
> Herzog August Bibliothek, Postfach 1364, D-38299 Wolfenbuettel, Tel.:
> +49-5331-808-130 (Fax -165)
> Handschriftendatenbank* http://diglib.hab.de/?db=mss
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 13:01:59 +0100
> From:    Piotr Bański <[hidden email]>
> Subject: failing to locate the minutes of recent Board meetings
>
> Dear Board,
>
> I am not able to find the minutes of the meeting ("Business meeting"
> and/or Board F2F meeting) that was/were held at the last TEI-MM. I have
> looked for them in the official TEI-C space and in the wiki. I was sure
> I saw a message from Michelle about the minutes and just put it aside
> for later, but now I'm beginning to think that this could have been a
> flashback from after Lyon. My perception of time has gotten very weird
> over the years.
>
> If the minutes are indeed missing, please treat this message as a
> friendly nudge to get them done at some point :-) And if they are out
> there and it's just me who fails at locating them, I'll be grateful for
> the link. I thought that I could use the minutes to solve on my own the
> issue of the potential mistake in the dates published for the upcoming
> TEI-MM -- I would love to have the proper dates in my calendar because
> of the other conference events taking place in the 2nd part of 2017 that
> I should plan ahead for.
>
> Thanks in advance and best regards,
>
>   Piotr
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 13:25:17 +0100
> From:    Franz Fischer <[hidden email]>
> Subject: RIDE 5 out!
>
> Dear TEI community,
>
> I am very happy to announce that we just published issue 5 of RIDE, the
> review journal for digital editions and resources.
>
> As in the previous three issues, we have 5 reviews (all in English) that
> critically assess scholarly digital editions. For your convenience, this
> is the table of contents:
>
> • Jane Austen’s Fiction Manuscripts, by Michelle Levy
> • Literary drafts, genetic criticism and computational technology. The
> Beckett Digital Manuscript Project, by Anna-Maria Sichani
> • Lope de Vega's La Dama Boba. Critical edition and digital archivek, by
> Antonia Rojas Castro
> • The 1641 Depositions, by Walter Scholger
> • The William Blake Archive, by Kendal Crawford and Michelle Levy
>
> All reviews can be accessed for free via our webpage: http://ride.i-d-e.de
>
> Enjoy the ride!
>
> Franz
>
>
> --
> Dr. Franz Fischer
> Cologne Center for eHumanities
> Universität zu Köln, Universitätsstr. 22, D-50923 Köln
> +49 - (0)221 - 470 - 4056
> [hidden email]
> @vranzvischer
>
> cceh.uni-koeln.de, dixit.uni-koeln.de
> i-d-e.de, ride.i-d-e.de
> digitalmedievalist.org, digitalmedievalist.org/journal
> guillelmus.uni-koeln.de, confessio.ie
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 13:02:21 +0000
> From:    "Dalmau, Michelle Denise" <[hidden email]>
> Subject: Re: failing to locate the minutes of recent Board meetings
>
> Hi, Piotr,
>
> The business meeting minutes will be posted in the next few days. I plan to summarize them and announce the 2017 officers!
>
> Thanks for the nudge,
> --Michelle
>
> Typos and Autocorrections Courtesy of my iPhone
>
>> On Feb 3, 2017, at 7:02 AM, Piotr Bański <[hidden email]> wrote:
>>
>> Dear Board,
>>
>> I am not able to find the minutes of the meeting ("Business meeting" and/or Board F2F meeting) that was/were held at the last TEI-MM. I have looked for them in the official TEI-C space and in the wiki. I was sure I saw a message from Michelle about the minutes and just put it aside for later, but now I'm beginning to think that this could have been a flashback from after Lyon. My perception of time has gotten very weird over the years.
>>
>> If the minutes are indeed missing, please treat this message as a friendly nudge to get them done at some point :-) And if they are out there and it's just me who fails at locating them, I'll be grateful for the link. I thought that I could use the minutes to solve on my own the issue of the potential mistake in the dates published for the upcoming TEI-MM -- I would love to have the proper dates in my calendar because of the other conference events taking place in the 2nd part of 2017 that I should plan ahead for.
>>
>> Thanks in advance and best regards,
>>
>> Piotr
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 14:30:14 +0100
> From:    Franz Fischer <[hidden email]>
> Subject: Re: locus from line 10 of folio 1r to line 5 of folio 3v
>
> If you don't want to point and link to the exact location on the page
> (and if you want to avoid too much encoding) a simple solution (yet not
> as powerful as Torsten's suggestion) could be as follows:
> <locus from="1r" to="3v">fol. 1r, l. 10 to fol. 3v, l.5</locus>
>
> If you prefer to store all information in the markup - not sure about
> something like this:
> <locus from="1r_10" to="3v_5"/>
>
> Franz
>
>> Am 03.02.2017 um 10:33 schrieb Torsten Schassan:
>> Dear Pietro,
>>
>> the attribute @n doesn't have the semantic you need: It is just a
>> number but not necessarily (and not predictable!) the one of a line.
>>
>> A solution I can think of would be to define the line either
>>
>> - as textual line as part of the "transcription" somewhere in <text>
>> or <sourceDoc> and refer to that it from your <locus>, or
>>
>> - as zone in some graphical element (surface, graphic) and point to
>> that using @facs.
>>
>> I don't think there's a defined way of including this information in
>> @from/@to of the locus element, although it should be possible to
>> think of the line as a fragment of the page, and thus refer to that
>> fragment using the XPointer way: uri#fragmentIdentifier, here e.g.
>> 1r#l10. Still, this way would require to identify the fragment
>> somewhere by an @xml:id.
>>
>>
>> Best, Torsten
>>
>>
>>> Am 02.02.2017 um 23:06 schrieb Pietro Liuzzo:
>>> Dear all,
>>>
>>> what is the best way to define the location of a text which starts at a
>>> certain line of a folio and ends at a certain line of another like from
>>> line 10 of folio 1r to line 5 of folio 3v
>>>
>>> <locus from="1r" n="10"/> <locus to="3v" n="5"/> for example?
>>>
>>> thanks a lot!
>>> Pietro
>>>
>>
>>
>
>
> --
> Dr. Franz Fischer
> Cologne Center for eHumanities
> Universität zu Köln, Universitätsstr. 22, D-50923 Köln
> +49 - (0)221 - 470 - 4056
> [hidden email]
> @vranzvischer
>
> cceh.uni-koeln.de, dixit.uni-koeln.de
> i-d-e.de, ride.i-d-e.de
> digitalmedievalist.org, digitalmedievalist.org/journal
> guillelmus.uni-koeln.de, confessio.ie
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 09:00:07 -0500
> From:    Syd Bauman <[hidden email]>
> Subject: XML character encoding (was "Re: [TEI-L] no worries")
>
> The sentiment, if a bit overstated, is correct IMHO. Another
> disadvantage is an XML file encoded in UTF-16 must begin with a byte
> order mark (U+FEFF). Does the operating system handle that? Does the
> XML editor? Do I?
>
> The full name for UCS-4 in an XML declaration is "ISO-10646-UCS-4",
> and this is one of the few places where XML is case insensitive. (So
> a processor should recognize "ISO-10646-ucs-4" just as well.)
>
> All that said, I don't know how to get my operating system to read &
> write UCS-4 (or even UTF-16, not that I care), so I always use UTF-8.
> :-|
>
>> Allow me a technical consideration and advice. UTF-16 is only a
>> useless historical artefact. It should not be used in any new
>> project. The only sane choices are UTF-8 and (if really needed)
>> UCS-4.
>>
>> UTF-16 is the worst of all encodings: it wastes bits like UCS-2/4,
>> is incompatible with ASCII and is as computationally hard to work
>> with as UTF-8.
>>
>> One should use either UTF-8 for ASCII compatibility and space
>> savings or UCS-4 for speed of computation (but only under certain
>> particular circumstances).
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 17:05:05 +0100
> From:    Emmanuel NGUE UM <[hidden email]>
> Subject: Re: standardizing linguistic encoding
>
> Hi,
>
> I am an African-Based linguist, and I have been following much of the
> discussions going on over TEI mailing list.
>
> I am not a TEI practitioner per se, but I am aware of the many application
> scenarios of this technology, including text corpora building.
>
> A couple of months ago, I sent an e-mail around via TEI mailing list asking
> whether anyone knew of any TEI based/inspired framework for the encoding of
> prosodic phenomena such as tones, especially in African tone languages. I
> got one or two responses from members. Unfortunately these responses did
> not address my specific concern.
>
> I wish to join on-going discussions about 'standardizing linguistic
> encoding', to bring to the fore of TEI standards development, the issue of
> "tone encoding".
>
> For the sake of clarification and given that not every one is necessarily
> an expert in tone languages, let me explain by examples what tone is in
> African tone languages.
>
> Given the followings tokens from Basaa, a bantu language spoken in Cameroon:
>
> (1) hól : to sharpen
>
> (2) hòl : to pay the dawry
>
> (3) hôl (as in *á hôl*): let him sharpen
>
> (4)hŏl : pay the dawry! (imperative)
>
>
>
> 2017-02-02 14:03 GMT+01:00 Eduard Drenth <[hidden email]>:
>
>> Thanks for your response! Standard in my case means practical, usable way
>> for encoding linguistic information in corpora using TEI.
>>
>> Indeed the theme is covered by https://github.com/LingSIG/
>> wordAttributes/wiki. Good to know of this http://wiki.tei-c.org/index.
>> php/SIG:TEI_for_Linguists as well.
>>
>> We choose to continue along the choosen path, it doesn't deviate too much
>> from uncustomized TEI, offers good support for editing and querying,
>> satisfies our linguists, adheres to http://universaldependencies.org and
>> is easy to convert to the 'real standard' when it is released.
>>
>> Perhaps our approach can be useful input for https://github.com/LingSIG/
>> wordAttributes, it is the result of quite extensive testing and
>> discussing.
>>
>> Eduard Drenth, Software Architekt
>>
>> [hidden email]
>>
>> Doelestrjitte 8
>> 8911 DX  Ljouwert
>> +31 58 234 30 47
>>
>> gpg: https://sks-keyservers.net/pks/lookup?op=get&search=
>> 0x065EF82A1E02CC43
>>
>> ________________________________________
>> From: Piotr Bański <[hidden email]>
>> Sent: Thursday, February 2, 2017 12:24 PM
>> To: Eduard Drenth; [hidden email]; Phillip Ströbel
>> Subject: Re: standardizing linguistic encoding
>>
>> Dear Eduard, [also addressing Philip and actually all... ]
>>
>> It's probably my conditioning as a member of various standardization
>> bodies that makes red lights flash in my head upon reading that you are
>> "developing a standard"... :-) I believe that standards are better
>> "developed" (or, more precisely, codified) on the basis of existing best
>> practices or other existing standards. As it is, I can observe that your
>> proposed encoding mixes up the level of tokens with the level of word
>> forms (in ISO MAF terminology[1]), and while it can be suitable for your
>> purposes, it is far from optimal in standardization terms.
>>
>> [at this point, the camera pans out]
>>
>> This year promises to be quite exciting for the TEI Linguistics SIG[2],
>> given that:
>>
>> (1) ISO LMF [3] is up for renewal and restructuring, and that several
>> teams (among others, from ENeL, PARTHENOS, CLARIN, and LingSIG) are
>> currently working on various modules for it,
>>
>> (2) ISO Tiger [4] is nearing publication (as in: weeks rather than
>> months) and opening a way for ISO TEIger, a TEI serialization of the ISO
>> model for syntactic encoding,
>>
>> (3) there is a rising push for streamlining inline linguistic markup,
>> coming from, among others, Martin Mueller's Early Print Project, BBAW's
>> existing practice (presented by Susanne Haaf at various TEI meetings),
>> the Ancient Greek Dependency Treebank (represented in this mailing list
>> by Giuseppe Celano, I believe), and now we learn of Philip Ströbel's
>> project and yours. And there are others. A tiny reflex of that is
>> contained at the LingSIG GitHub space [5], which is only meant as a
>> _seed_ for collaborative effort rather than any personal statement.
>>
>> Andreas Witt and I are thinking of how to address and channel this
>> boiling mass of initiatives. One possibility could be to target the
>> upcoming TEI Members Meeting[6] and have a focused pre-conference
>> workshop designed to formulate a very precise and very concrete proposal
>> for grammatical encoding synchronized across inline, standoff and
>> dictionary markup, a proposal that we could submit to the TEI Technical
>> Council at the end of the day. "The day" seems distant, but if we want
>> to have a serious proposal at the end of it, work should start about now.
>>
>> May I invite all interested parties to join the Linguistics SIG mailing
>> list (by going to [7]) and GitHub space (by sending me, off-list, your
>> github username), and to, well, have a go at it... :-)
>>
>> Best regards,
>>
>>   Piotr
>>
>> [1]:
>> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
>> detail.htm?csnumber=51934
>> [2]: http://wiki.tei-c.org/index.php/SIG:TEI_for_Linguists
>> [3]:
>> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
>> detail.htm?csnumber=68516
>> [4]:
>> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
>> detail.htm?csnumber=62491
>> [5]: https://github.com/LingSIG/wordAttributes/wiki
>> [6]: http://members.tei-c.org/Events/meetings
>>
>>
>>
>>
>>
>>
>>> On 02/01/17 16:51, Eduard Drenth wrote:
>>> Dear all,
>>>
>>> Here in Holland we are developing a standard to encode linguistic and
>>> lemma information for various word situations using TEI. We have been
>>> trying several solutions (tei:fs/tei:f, tei:interp, tei:span, ...) and
>>> finaly chose for TEI customization which gives us standard xsd
>>> validation, editor support and a simple focused solution. For linguistic
>>> terminology we use as much as possible http://universaldependencies.org/
>> .
>>>
>>>
>>> We are curious as to what you think, see below for details. We hope this
>>> solution may be of use for those who want to encode linguistic
>>> information using TEI. Also this may help standardizing linguistic
>>> encoding in TEI.
>>>
>>>
>>> If this all is worthwhile I would like to donate/publish the solution
>>> somewhere.
>>>
>>>
>>> snippet customization:
>>>
>>>
>>>            <schemaSpec ident="tdb" docLang="en" prefix="tei_"
>>> xml:lang="en">
>>>
>>>                ..
>>>
>>>                ..
>>>
>>>                <classSpec type="atts" ident="att.linguistics"
>>> module="analytics">
>>>
>>>                    <attList>
>>>                        <attDef ident="linguistics"
>>> ns="http://www.fryske-akademy.org/grammar/1.0">
>>>                            <desc>
>>>                                documentation....
>>>                            </desc>
>>>                            <datatype maxOccurs="unbounded">
>>>                                <dataRef key="teiata.enumerated"/>
>>>                            </datatype>
>>>                            <valList type="closed">
>>>                                <valItem ident="Features.Abbr">
>>>                                    <desc>Boolean feature. Is this an
>>> abbreviation?</desc>
>>>                                </valItem>
>>>                                <valItem ident="Features.Poss">
>>>                                    <desc>Boolean feature of pronouns,
>>> determiners or adjectives. It tells whether the word is
>> possessive.</desc>
>>>                                </valItem>
>>>                                <valItem ident="PronType.Prs">
>>>                                    <desc>personal pronoun or
>>> determiner</desc>
>>>                                </valItem>
>>>
>>>                                 ..
>>>
>>>                                 ..
>>>
>>>
>>> example word encoding:
>>>
>>>
>>> <tei:w fa:linguistics="Pos.NOUN "
>>> lemmaRef="inprogress://lemmasystem/Hollands/frik/1"
>>> lemma="frik">Frik</tei:w>
>>>
>>>
>>> example split word encoding:
>>>
>>>
>>> <tei:w xml:id="staet-op-176" rendition="#split">staet</tei:w>
>>>
>>> <tei:w fa:linguistics="Pos.ADV "
>>> lemmaRef="inprogress://lemmasystem/Hollands/al/3"
>>> lemma="al">al</tei:w><tei:w>wringende</tei:w>
>>>
>>> <tei:w xml:id="staet-op-179" rendition="#split">op</tei:w>
>>>
>>> <tei:join result="w" scope="root" lemma="opstean" target="#staet-op-176
>>> #staet-op-179" lemmaRef="inprogress://lemmasystem/Hollands/opstean/1"
>>> fa:linguistics="th-si-pa Pos.VERB "/>
>>>
>>>
>>> example word consist of more lemma's (we don't use this yet....):
>>>
>>>
>>> <tei:choice>
>>>  <tei:orig>
>>>    <tei:w fa:linguistics=".....">aint</tei:w>
>>>  </tei:orig>
>>>  <tei:reg>
>>>    <tei:w lemma="be" fa:linguistics="...">am</tei:w>
>>>    <tei:w lemma="not" fa:linguistics="...">not</tei:w>
>>>  </tei:reg>
>>> </tei:choice>
>>>
>>>
>>> Bye,
>>>
>>>
>>> Eduard Drenth, Software Architekt
>>>
>>>
>>> [hidden email]
>>>
>>>
>>> Doelestrjitte 8
>>>
>>> 8911 DX  Ljouwert
>>>
>>> +31 58 234 30 47
>>>
>>>
>>> gpg: https://sks-keyservers.net/pks/lookup?op=get&search=
>> 0x065EF82A1E02CC43
>>>
>>
>> --
>> Piotr Bański, Ph.D.
>> Senior Researcher,
>> Institut für Deutsche Sprache,
>> R5 6-13
>> 68-161 Mannheim, Germany
>>
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 17:38:23 +0100
> From:    Emmanuel NGUE UM <[hidden email]>
> Subject: Re: standardizing linguistic encoding
>
> Hi,
>
> I am an African-Based linguist, and I have been following much of the
> discussions going on over TEI mailing list.
>
> I am not a TEI practitioner per se, but I am aware of the many application
> scenarios of this technology, including text corpora building.
>
> A couple of months ago, I sent an e-mail around via TEI mailing list asking
> whether anyone knew of any TEI based/inspired framework for the encoding of
> prosodic phenomena such as tones, especially in African tone languages. I
> got one or two responses from members. Unfortunately these responses did
> not address my specific concern.
>
> I wish to join on-going discussions about 'standardizing linguistic
> encoding', to bring to the fore of TEI standards development, the issue of
> "tone encoding".
>
> For the sake of clarification and given that not every one is necessarily
> an expert in tone languages, let me explain by examples what tone is in
> African tone languages.
>
> Given the followings tokens from Basaa, a bantu language spoken in Cameroon:
>
> (1) hól : to sharpen
>
> (2) hòl : to pay the dawry
>
> (3) hôl (as in *á hôl*): let him sharpen
>
> (4) hŏl : pay the dawry! (imperative)
>
> In (1) through (4), the difference in meaning of these words is attributed
> to the difference in relative pitch level of the syllable: "high" in (1),
> "low" in (2), contour or two-level "low-high" in (3), contour or two-level
> "low-high" in (4).
>
> While the semantics associated with tone levels in (1) and (2) is lexically
> encoded, the ones in (3) and (4) are complemented with grammatical
> information, namely hortative in (3) and imperative in (4), thus resulting
> in complex (contour) tone shapes in writing.
>
> Tone representation in the above examples is graphical, and is meant to
> simply anchor pitch melody; this form of representation does not inform
> much about the semantics associated with a specific pitch level in and
> accross words. This is so mostly because pitch 'labels' (high, low,
> low-high, high-low) do not encode persistent meaning, but may instead
> trigger each and array of grammatical information such as tense, mood,
> aspect, negation, ect., depending on the context.
>
> I personally believe that for better processeability and representation of
> textual information in tone langues, there is need for developping
> unambiguous encoding framework devoid of graphical representation of tones,
> and I believe TEI to be one possible response to this.
>
> Because TEI is an open standard which is meant to be tailored to the
> specific needs of users, I think it is our responsiblity as Africanists and
> Bantuists, to raise TEI community's awarness about accounting for the
> specificities of the languages we are working on, when it comes to
> standardizing linguistic encoding.
>
> Best
>
> Emmanuel Ngué Um
> Language Archivist for ALORA
>
> 2017-02-02 14:03 GMT+01:00 Eduard Drenth <[hidden email]>:
>
>> Thanks for your response! Standard in my case means practical, usable way
>> for encoding linguistic information in corpora using TEI.
>>
>> Indeed the theme is covered by https://github.com/LingSIG/
>> wordAttributes/wiki. Good to know of this http://wiki.tei-c.org/index.
>> php/SIG:TEI_for_Linguists as well.
>>
>> We choose to continue along the choosen path, it doesn't deviate too much
>> from uncustomized TEI, offers good support for editing and querying,
>> satisfies our linguists, adheres to http://universaldependencies.org and
>> is easy to convert to the 'real standard' when it is released.
>>
>> Perhaps our approach can be useful input for https://github.com/LingSIG/
>> wordAttributes, it is the result of quite extensive testing and
>> discussing.
>>
>> Eduard Drenth, Software Architekt
>>
>> [hidden email]
>>
>> Doelestrjitte 8
>> 8911 DX  Ljouwert
>> +31 58 234 30 47
>>
>> gpg: https://sks-keyservers.net/pks/lookup?op=get&search=
>> 0x065EF82A1E02CC43
>>
>> ________________________________________
>> From: Piotr Bański <[hidden email]>
>> Sent: Thursday, February 2, 2017 12:24 PM
>> To: Eduard Drenth; [hidden email]; Phillip Ströbel
>> Subject: Re: standardizing linguistic encoding
>>
>> Dear Eduard, [also addressing Philip and actually all... ]
>>
>> It's probably my conditioning as a member of various standardization
>> bodies that makes red lights flash in my head upon reading that you are
>> "developing a standard"... :-) I believe that standards are better
>> "developed" (or, more precisely, codified) on the basis of existing best
>> practices or other existing standards. As it is, I can observe that your
>> proposed encoding mixes up the level of tokens with the level of word
>> forms (in ISO MAF terminology[1]), and while it can be suitable for your
>> purposes, it is far from optimal in standardization terms.
>>
>> [at this point, the camera pans out]
>>
>> This year promises to be quite exciting for the TEI Linguistics SIG[2],
>> given that:
>>
>> (1) ISO LMF [3] is up for renewal and restructuring, and that several
>> teams (among others, from ENeL, PARTHENOS, CLARIN, and LingSIG) are
>> currently working on various modules for it,
>>
>> (2) ISO Tiger [4] is nearing publication (as in: weeks rather than
>> months) and opening a way for ISO TEIger, a TEI serialization of the ISO
>> model for syntactic encoding,
>>
>> (3) there is a rising push for streamlining inline linguistic markup,
>> coming from, among others, Martin Mueller's Early Print Project, BBAW's
>> existing practice (presented by Susanne Haaf at various TEI meetings),
>> the Ancient Greek Dependency Treebank (represented in this mailing list
>> by Giuseppe Celano, I believe), and now we learn of Philip Ströbel's
>> project and yours. And there are others. A tiny reflex of that is
>> contained at the LingSIG GitHub space [5], which is only meant as a
>> _seed_ for collaborative effort rather than any personal statement.
>>
>> Andreas Witt and I are thinking of how to address and channel this
>> boiling mass of initiatives. One possibility could be to target the
>> upcoming TEI Members Meeting[6] and have a focused pre-conference
>> workshop designed to formulate a very precise and very concrete proposal
>> for grammatical encoding synchronized across inline, standoff and
>> dictionary markup, a proposal that we could submit to the TEI Technical
>> Council at the end of the day. "The day" seems distant, but if we want
>> to have a serious proposal at the end of it, work should start about now.
>>
>> May I invite all interested parties to join the Linguistics SIG mailing
>> list (by going to [7]) and GitHub space (by sending me, off-list, your
>> github username), and to, well, have a go at it... :-)
>>
>> Best regards,
>>
>>   Piotr
>>
>> [1]:
>> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
>> detail.htm?csnumber=51934
>> [2]: http://wiki.tei-c.org/index.php/SIG:TEI_for_Linguists
>> [3]:
>> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
>> detail.htm?csnumber=68516
>> [4]:
>> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
>> detail.htm?csnumber=62491
>> [5]: https://github.com/LingSIG/wordAttributes/wiki
>> [6]: http://members.tei-c.org/Events/meetings
>>
>>
>>
>>
>>
>>
>>> On 02/01/17 16:51, Eduard Drenth wrote:
>>> Dear all,
>>>
>>> Here in Holland we are developing a standard to encode linguistic and
>>> lemma information for various word situations using TEI. We have been
>>> trying several solutions (tei:fs/tei:f, tei:interp, tei:span, ...) and
>>> finaly chose for TEI customization which gives us standard xsd
>>> validation, editor support and a simple focused solution. For linguistic
>>> terminology we use as much as possible http://universaldependencies.org/
>> .
>>>
>>>
>>> We are curious as to what you think, see below for details. We hope this
>>> solution may be of use for those who want to encode linguistic
>>> information using TEI. Also this may help standardizing linguistic
>>> encoding in TEI.
>>>
>>>
>>> If this all is worthwhile I would like to donate/publish the solution
>>> somewhere.
>>>
>>>
>>> snippet customization:
>>>
>>>
>>>            <schemaSpec ident="tdb" docLang="en" prefix="tei_"
>>> xml:lang="en">
>>>
>>>                ..
>>>
>>>                ..
>>>
>>>                <classSpec type="atts" ident="att.linguistics"
>>> module="analytics">
>>>
>>>                    <attList>
>>>                        <attDef ident="linguistics"
>>> ns="http://www.fryske-akademy.org/grammar/1.0">
>>>                            <desc>
>>>                                documentation....
>>>                            </desc>
>>>                            <datatype maxOccurs="unbounded">
>>>                                <dataRef key="teiata.enumerated"/>
>>>                            </datatype>
>>>                            <valList type="closed">
>>>                                <valItem ident="Features.Abbr">
>>>                                    <desc>Boolean feature. Is this an
>>> abbreviation?</desc>
>>>                                </valItem>
>>>                                <valItem ident="Features.Poss">
>>>                                    <desc>Boolean feature of pronouns,
>>> determiners or adjectives. It tells whether the word is
>> possessive.</desc>
>>>                                </valItem>
>>>                                <valItem ident="PronType.Prs">
>>>                                    <desc>personal pronoun or
>>> determiner</desc>
>>>                                </valItem>
>>>
>>>                                 ..
>>>
>>>                                 ..
>>>
>>>
>>> example word encoding:
>>>
>>>
>>> <tei:w fa:linguistics="Pos.NOUN "
>>> lemmaRef="inprogress://lemmasystem/Hollands/frik/1"
>>> lemma="frik">Frik</tei:w>
>>>
>>>
>>> example split word encoding:
>>>
>>>
>>> <tei:w xml:id="staet-op-176" rendition="#split">staet</tei:w>
>>>
>>> <tei:w fa:linguistics="Pos.ADV "
>>> lemmaRef="inprogress://lemmasystem/Hollands/al/3"
>>> lemma="al">al</tei:w><tei:w>wringende</tei:w>
>>>
>>> <tei:w xml:id="staet-op-179" rendition="#split">op</tei:w>
>>>
>>> <tei:join result="w" scope="root" lemma="opstean" target="#staet-op-176
>>> #staet-op-179" lemmaRef="inprogress://lemmasystem/Hollands/opstean/1"
>>> fa:linguistics="th-si-pa Pos.VERB "/>
>>>
>>>
>>> example word consist of more lemma's (we don't use this yet....):
>>>
>>>
>>> <tei:choice>
>>>  <tei:orig>
>>>    <tei:w fa:linguistics=".....">aint</tei:w>
>>>  </tei:orig>
>>>  <tei:reg>
>>>    <tei:w lemma="be" fa:linguistics="...">am</tei:w>
>>>    <tei:w lemma="not" fa:linguistics="...">not</tei:w>
>>>  </tei:reg>
>>> </tei:choice>
>>>
>>>
>>> Bye,
>>>
>>>
>>> Eduard Drenth, Software Architekt
>>>
>>>
>>> [hidden email]
>>>
>>>
>>> Doelestrjitte 8
>>>
>>> 8911 DX  Ljouwert
>>>
>>> +31 58 234 30 47
>>>
>>>
>>> gpg: https://sks-keyservers.net/pks/lookup?op=get&search=
>> 0x065EF82A1E02CC43
>>>
>>
>> --
>> Piotr Bański, Ph.D.
>> Senior Researcher,
>> Institut für Deutsche Sprache,
>> R5 6-13
>> 68-161 Mannheim, Germany
>>
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 18:12:48 +0100
> From:    Piotr Bański <[hidden email]>
> Subject: Re: standardizing linguistic encoding
>
> Hi Emmanuel,
>
> It's great to hear from you. You may be pleased to hear about a set of
> tools that can be used to encode the information you need very
> precisely, and to attach that information to objects of any granularity
> ("standard" tokens, morphs, phrases) and any sort (orthographic,
> prosodic, morphological). These tools have been defined jointly by the
> TEI and ISO, and the TEI description is free and well-tested, and you
> can read about it in the chapter on feature structures:
>
> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/FS.html
>
> (I'd suggest skipping 18.11 when reading this chapter for the first time)
>
> These tools make it possible for you to describe practically any feature
> matrices needed in linguistics. And the TEI has mechanisms for attaching
> them to linguistic/textual objects.
>
> If you'd like to pursue this further, you may be interested in joining
> the TEI Linguistics SIG mailing list at
>
> https://listserv.brown.edu/archives/cgi-bin/wa?A0=TEI-LINGUISTICS
>
> where we have just talked about cases slightly more complex than the
> kind of attribute-based markup discussed here.
>
> Best regards,
>
>   Piotr
>
>
>
>> On 02/03/17 17:38, Emmanuel NGUE UM wrote:
>> Hi,
>>
>> I am an African-Based linguist, and I have been following much of the
>> discussions going on over TEI mailing list.
>>
>> I am not a TEI practitioner per se, but I am aware of the many
>> application scenarios of this technology, including text corpora building.
>>
>> A couple of months ago, I sent an e-mail around via TEI mailing list
>> asking whether anyone knew of any TEI based/inspired framework for the
>> encoding of prosodic phenomena such as tones, especially in African tone
>> languages. I got one or two responses from members. Unfortunately these
>> responses did not address my specific concern.
>>
>> I wish to join on-going discussions about 'standardizing linguistic
>> encoding', to bring to the fore of TEI standards development, the issue
>> of "tone encoding".
>>
>> For the sake of clarification and given that not every one is
>> necessarily an expert in tone languages, let me explain by examples what
>> tone is in African tone languages.
>>
>> Given the followings tokens from Basaa, a bantu language spoken in Cameroon:
>>
>> (1) hól : to sharpen
>>
>> (2) hòl : to pay the dawry
>>
>> (3) hôl (as in /á hôl/): let him sharpen
>>
>> (4) hŏl : pay the dawry! (imperative)
>>
>> In (1) through (4), the difference in meaning of these words is
>> attributed to the difference in relative pitch level of the syllable:
>> "high" in (1), "low" in (2), contour or two-level "low-high" in (3),
>> contour or two-level "low-high" in (4).
>>
>> While the semantics associated with tone levels in (1) and (2) is
>> lexically encoded, the ones in (3) and (4) are complemented with
>> grammatical information, namely hortative in (3) and imperative in (4),
>> thus resulting in complex (contour) tone shapes in writing.
>>
>> Tone representation in the above examples is graphical, and is meant to
>> simply anchor pitch melody; this form of representation does not inform
>> much about the semantics associated with a specific pitch level in and
>> accross words. This is so mostly because pitch 'labels' (high, low,
>> low-high, high-low) do not encode persistent meaning, but may instead
>> trigger each and array of grammatical information such as tense, mood,
>> aspect, negation, ect., depending on the context.
>>
>> I personally believe that for better processeability and representation
>> of textual information in tone langues, there is need for developping
>> unambiguous encoding framework devoid of graphical representation of
>> tones, and I believe TEI to be one possible response to this.
>>
>> Because TEI is an open standard which is meant to be tailored to the
>> specific needs of users, I think it is our responsiblity as Africanists
>> and Bantuists, to raise TEI community's awarness about accounting for
>> the specificities of the languages we are working on, when it comes to
>> standardizing linguistic encoding.
>>
>> Best
>>
>> Emmanuel Ngué Um
>> Language Archivist for ALORA
>>
>> 2017-02-02 14:03 GMT+01:00 Eduard Drenth <[hidden email]
>> <mailto:[hidden email]>>:
>>
>>    Thanks for your response! Standard in my case means practical,
>>    usable way for encoding linguistic information in corpora using TEI.
>>
>>    Indeed the theme is covered by
>>    https://github.com/LingSIG/wordAttributes/wiki
>>    <https://github.com/LingSIG/wordAttributes/wiki>. Good to know of
>>    this http://wiki.tei-c.org/index.php/SIG:TEI_for_Linguists
>>    <http://wiki.tei-c.org/index.php/SIG:TEI_for_Linguists> as well.
>>
>>    We choose to continue along the choosen path, it doesn't deviate too
>>    much from uncustomized TEI, offers good support for editing and
>>    querying, satisfies our linguists, adheres to
>>    http://universaldependencies.org <http://universaldependencies.org>
>>    and is easy to convert to the 'real standard' when it is released.
>>
>>    Perhaps our approach can be useful input for
>>    https://github.com/LingSIG/wordAttributes
>>    <https://github.com/LingSIG/wordAttributes>, it is the result of
>>    quite extensive testing and discussing.
>>
>>    Eduard Drenth, Software Architekt
>>
>>    [hidden email] <mailto:[hidden email]>
>>
>>    Doelestrjitte 8
>>    8911 DX  Ljouwert
>>    +31 58 234 30 47 <tel:%2B31%2058%20234%2030%2047>
>>
>>    gpg:
>>    https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43
>>    <https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43>
>>
>>    ________________________________________
>>    From: Piotr Bański <[hidden email]
>>    <mailto:[hidden email]>>
>>    Sent: Thursday, February 2, 2017 12:24 PM
>>    To: Eduard Drenth; [hidden email]
>>    <mailto:[hidden email]>; Phillip Ströbel
>>    Subject: Re: standardizing linguistic encoding
>>
>>    Dear Eduard, [also addressing Philip and actually all... ]
>>
>>    It's probably my conditioning as a member of various standardization
>>    bodies that makes red lights flash in my head upon reading that you are
>>    "developing a standard"... :-) I believe that standards are better
>>    "developed" (or, more precisely, codified) on the basis of existing best
>>    practices or other existing standards. As it is, I can observe that your
>>    proposed encoding mixes up the level of tokens with the level of word
>>    forms (in ISO MAF terminology[1]), and while it can be suitable for your
>>    purposes, it is far from optimal in standardization terms.
>>
>>    [at this point, the camera pans out]
>>
>>    This year promises to be quite exciting for the TEI Linguistics SIG[2],
>>    given that:
>>
>>    (1) ISO LMF [3] is up for renewal and restructuring, and that several
>>    teams (among others, from ENeL, PARTHENOS, CLARIN, and LingSIG) are
>>    currently working on various modules for it,
>>
>>    (2) ISO Tiger [4] is nearing publication (as in: weeks rather than
>>    months) and opening a way for ISO TEIger, a TEI serialization of the ISO
>>    model for syntactic encoding,
>>
>>    (3) there is a rising push for streamlining inline linguistic markup,
>>    coming from, among others, Martin Mueller's Early Print Project, BBAW's
>>    existing practice (presented by Susanne Haaf at various TEI meetings),
>>    the Ancient Greek Dependency Treebank (represented in this mailing list
>>    by Giuseppe Celano, I believe), and now we learn of Philip Ströbel's
>>    project and yours. And there are others. A tiny reflex of that is
>>    contained at the LingSIG GitHub space [5], which is only meant as a
>>    _seed_ for collaborative effort rather than any personal statement.
>>
>>    Andreas Witt and I are thinking of how to address and channel this
>>    boiling mass of initiatives. One possibility could be to target the
>>    upcoming TEI Members Meeting[6] and have a focused pre-conference
>>    workshop designed to formulate a very precise and very concrete proposal
>>    for grammatical encoding synchronized across inline, standoff and
>>    dictionary markup, a proposal that we could submit to the TEI Technical
>>    Council at the end of the day. "The day" seems distant, but if we want
>>    to have a serious proposal at the end of it, work should start about
>>    now.
>>
>>    May I invite all interested parties to join the Linguistics SIG mailing
>>    list (by going to [7]) and GitHub space (by sending me, off-list, your
>>    github username), and to, well, have a go at it... :-)
>>
>>    Best regards,
>>
>>       Piotr
>>
>>    [1]:
>>    http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=51934
>>    <http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=51934>
>>    [2]: http://wiki.tei-c.org/index.php/SIG:TEI_for_Linguists
>>    <http://wiki.tei-c.org/index.php/SIG:TEI_for_Linguists>
>>    [3]:
>>    http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=68516
>>    <http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=68516>
>>    [4]:
>>    http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=62491
>>    <http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=62491>
>>    [5]: https://github.com/LingSIG/wordAttributes/wiki
>>    <https://github.com/LingSIG/wordAttributes/wiki>
>>    [6]: http://members.tei-c.org/Events/meetings
>>    <http://members.tei-c.org/Events/meetings>
>>
>>
>>
>>
>>
>>
>>>    On 02/01/17 16:51, Eduard Drenth wrote:
>>> Dear all,
>>>
>>> Here in Holland we are developing a standard to encode linguistic and
>>> lemma information for various word situations using TEI. We have been
>>> trying several solutions (tei:fs/tei:f, tei:interp, tei:span, ...) and
>>> finaly chose for TEI customization which gives us standard xsd
>>> validation, editor support and a simple focused solution. For
>>    linguistic
>>> terminology we use as much as possible
>>    http://universaldependencies.org/ <http://universaldependencies.org/>.
>>>
>>>
>>> We are curious as to what you think, see below for details. We
>>    hope this
>>> solution may be of use for those who want to encode linguistic
>>> information using TEI. Also this may help standardizing linguistic
>>> encoding in TEI.
>>>
>>>
>>> If this all is worthwhile I would like to donate/publish the solution
>>> somewhere.
>>>
>>>
>>> snippet customization:
>>>
>>>
>>>            <schemaSpec ident="tdb" docLang="en" prefix="tei_"
>>> xml:lang="en">
>>>
>>>                ..
>>>
>>>                ..
>>>
>>>                <classSpec type="atts" ident="att.linguistics"
>>> module="analytics">
>>>
>>>                    <attList>
>>>                        <attDef ident="linguistics"
>>> ns="http://www.fryske-akademy.org/grammar/1.0
>>    <http://www.fryske-akademy.org/grammar/1.0>">
>>>                            <desc>
>>>                                documentation....
>>>                            </desc>
>>>                            <datatype maxOccurs="unbounded">
>>>                                <dataRef key="teiata.enumerated"/>
>>>                            </datatype>
>>>                            <valList type="closed">
>>>                                <valItem ident="Features.Abbr">
>>>                                    <desc>Boolean feature. Is this an
>>> abbreviation?</desc>
>>>                                </valItem>
>>>                                <valItem ident="Features.Poss">
>>>                                    <desc>Boolean feature of pronouns,
>>> determiners or adjectives. It tells whether the word is
>>    possessive.</desc>
>>>                                </valItem>
>>>                                <valItem ident="PronType.Prs">
>>>                                    <desc>personal pronoun or
>>> determiner</desc>
>>>                                </valItem>
>>>
>>>                                 ..
>>>
>>>                                 ..
>>>
>>>
>>> example word encoding:
>>>
>>>
>>> <tei:w fa:linguistics="Pos.NOUN "
>>> lemmaRef="inprogress://lemmasystem/Hollands/frik/1"
>>> lemma="frik">Frik</tei:w>
>>>
>>>
>>> example split word encoding:
>>>
>>>
>>> <tei:w xml:id="staet-op-176" rendition="#split">staet</tei:w>
>>>
>>> <tei:w fa:linguistics="Pos.ADV "
>>> lemmaRef="inprogress://lemmasystem/Hollands/al/3"
>>> lemma="al">al</tei:w><tei:w>wringende</tei:w>
>>>
>>> <tei:w xml:id="staet-op-179" rendition="#split">op</tei:w>
>>>
>>> <tei:join result="w" scope="root" lemma="opstean"
>>    target="#staet-op-176
>>> #staet-op-179" lemmaRef="inprogress://lemmasystem/Hollands/opstean/1"
>>> fa:linguistics="th-si-pa Pos.VERB "/>
>>>
>>>
>>> example word consist of more lemma's (we don't use this yet....):
>>>
>>>
>>> <tei:choice>
>>>  <tei:orig>
>>>    <tei:w fa:linguistics=".....">aint</tei:w>
>>>  </tei:orig>
>>>  <tei:reg>
>>>    <tei:w lemma="be" fa:linguistics="...">am</tei:w>
>>>    <tei:w lemma="not" fa:linguistics="...">not</tei:w>
>>>  </tei:reg>
>>> </tei:choice>
>>>
>>>
>>> Bye,
>>>
>>>
>>> Eduard Drenth, Software Architekt
>>>
>>>
>>> [hidden email] <mailto:[hidden email]>
>>>
>>>
>>> Doelestrjitte 8
>>>
>>> 8911 DX  Ljouwert
>>>
>>> +31 58 234 30 47 <tel:%2B31%2058%20234%2030%2047>
>>>
>>>
>>> gpg:
>>    https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43
>>    <https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43>
>>>
>>
>>    --
>>    Piotr Bański, Ph.D.
>>    Senior Researcher,
>>    Institut für Deutsche Sprache,
>>    R5 6-13
>>    68-161 Mannheim, Germany
>>
>>
>
> --
> Piotr Bański, Ph.D.
> Senior Researcher,
> Institut für Deutsche Sprache,
> R5 6-13
> 68-161 Mannheim, Germany
>
> ------------------------------
>
> Date:    Fri, 3 Feb 2017 22:05:14 +0000
> From:    "Birnbaum, David J" <[hidden email]>
> Subject: Re: Editing Arabic TEI
>
> Dear TEI-L,
>
> Thanks very much to all who responded to my inquiry about editing Arabic
> in <oXygen/>, and I have passed the information along to the colleague on
> whose behalf I was asking.
>
> Best,
>
> David
> __
>
> On 2017-03-02, 4:14 AM, "TEI (Text Encoding Initiative) public discussion
> list on behalf of Gioele Barabucci" <[hidden email] on behalf of
> [hidden email]> wrote:
>
>>> Am 03.02.2017 um 09:53 schrieb Radu Coravu:
>>> If you have some sample TEI documents and give us some hints about what
>>> does not work as expected we could try to improve the behavior in a
>>> future version. Unfortunately we do not use RTL writing ourselves so
>>> sometimes it's hard for us to understand what the expected editing
>>> behaviors should be, that's why we need help with this.
>>
>> Dear Radu, dear participants,
>>
>> we of the Averroes project (Uni of Cologne, DARE, CCeH) [1] have plenty
>> of material I can send you to illustrate the "ergonomic" problems that
>> editors are facing when using oXygen to edit Arabic but also Hebrew
>> texts. I'll contact you privately.
>>
>> Mostly it has to do with a clash of expectations between what happens
>> when letters are typed and how things appear on the screen, for example
>> when Latin characters (tags or punctuation marks) and Arabic characters
>> are on the same line.
>>
>> A concrete example. Suppose that A, B and C are Arabic letters and | is
>> the cursor. If you type "<line>", then A, then B then C, you get the
>> following result (correct)
>>
>> <line>|CBA
>>
>> If, at that point, you type a period, you will get the incorrect
>>
>> <line>|CBA.
>>
>> instead of correct version
>>
>> <line>|.CBA
>>
>> I used "incorrect", but the behaviour is not really incorrect. As David
>> said, one can see the engineering reasons behind it, but the editors are
>> used to other word processing applications and the behaviour of oXygen
>> just feels wrong to them.
>>
>> This is just an example. There are plenty of more complicated cases I
>> can illustrate. Solving them would improve the quality of life of the
>> editors and transcribers very much. ;)
>>
>> Regards,
>>
>> [1]
>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Faverroes.u
>> ni-koeln.de%2F&data=01%7C01%7Cdjbpitt%40PITT.EDU%7C0e01e5cbfd924718641908d
>> 44c151912%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1&sdata=d6R%2ByVXcLnMw1FvbA
>> 7vipjIsrQgq2cXbvvY%2BKxHq1%2BQ%3D&reserved=0
>>
>> --
>> Gioele Barabucci <[hidden email]>
>
> ------------------------------
>
> End of TEI-L Digest - 2 Feb 2017 to 3 Feb 2017 (#2017-28)
> *********************************************************
Reply | Threaded
Open this post in threaded view
|

Re: TEI-L Digest - 2 Feb 2017 to 3 Feb 2017 (#2017-28)

Piotr Banski
Hi Kathryn,

I'm sorry for your loss.
The nudge was indeed meant as friendly, I know how it is when deadlines
loom all around you. And I only needed the minutes only to be able to
confirm the dates of the upcoming TEI-MM, which Martin Holmes has kindly
done offlist, in the meantime. So there's no pressure from me in this
regard.

Best regards,

   Piotr

On 02/04/17 14:49, Kathryn Tomasek wrote:

> Re: Piotr's friendly nudge about TEI-MM minutes from Vienna.
>
> The fact that the minutes have not been posted is my fault. There was a death in my family in the fall, and I have fallen terribly behind in all areas of business.
>
> With sincere apologies,
>
> Kathryn Tomasek
>
>> On Feb 4, 2017, at 12:00 AM, TEI-L automatic digest system <[hidden email]> wrote:
>>
>> There are 16 messages totaling 3402 lines in this issue.
>>
>> Topics of the day:
>>
>>  1. Editing Arabic TEI (5)
>>  2. visiting fellows, Maynooth University
>>  3. no worries
>>  4. locus from line 10 of folio 1r to line 5 of folio 3v (2)
>>  5. failing to locate the minutes of recent Board meetings (2)
>>  6. RIDE 5 out!
>>  7. XML character encoding (was "Re: [TEI-L] no worries")
>>  8. standardizing linguistic encoding (3)
>>
>> ----------------------------------------------------------------------
>>
>> Date:    Fri, 3 Feb 2017 09:46:06 +0100
>> From:    Frederik Elwert <[hidden email]>
>> Subject: Re: Editing Arabic TEI
>>
>> Dear David,
>>
>> I helped a colleague set up Oxygen for editing an Arabic manuscript. It
>> works quite okay. Indeed, the Author view is more suited for substantial
>> work on the text, but I guess this is an inherent problem of XML for RTL
>> languages: The tags themselves run LTR, the text in between in the other
>> direction, so the cursor swaps direction all the time, making the
>> behaviour slightly unpredictable all the time.
>>
>> Best,
>> Frederik
>>
>>
>>
>>> Am 02.02.2017 um 22:34 schrieb Birnbaum, David J:
>>> Dear TEI-L,
>>>
>>> I don't work with Arabic texts myself, but some of my students and
>>> colleagues do, and one has just asked me to recommend an XML editor. I use
>>> Oxygen for all of my own work, and I've watched the video that SyncroSoft
>>> produced about editing Arabic in the Oxygen Author view, but before I
>>> point my colleague in that and only that direction, I wanted to ask what
>>> others have used to edit Arabic TEI documents. When I've poked at Arabic
>>> in Oxygen before (older versions, though, so the following may no longer
>>> be the case), the Text view sometimes stranded angle brackets in the wrong
>>> place. The Author view obviously (no angle brackets) didn't do that, but
>>> it broke onto new lines in places that made sense from an engineering
>>> perspective, but that made the continuous text harder to read. All in all,
>>> it was usable, but should my colleague also be considering alternatives?
>>>
>>> Thanks,
>>>
>>> David
>>>
>>
>> --
>> Dr. Frederik Elwert
>>
>> Digital Humanities Coordinator
>> Center for Religious Studies
>> Ruhr-University Bochum
>>
>> Universitätsstr. 90a
>> D-44780 Bochum
>>
>> Phone +49(0)234 32-23024
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 09:52:16 +0100
>> From:    Markus Schnöpf <[hidden email]>
>> Subject: Re: Editing Arabic TEI
>>
>> At our project, Corpus Coranicum and some satellite projects (early arabic poetry, Paleocoran) we use oxygen XML for editing the comments. As well, we have developed a font for displaying the arabic text better than standard fonts do (arabic script is normally to small, when used in a mixed writing environment), called Coranica (http://corpuscoranicum.de/about/tools). [We are at the moment working for an english translation, at the moment only german, sorry]. As well, we want to prepare our common editing environment, ediarum (which is a ‚plugin‘ for oxygen, giving the editors a set of tags normally needed in digital editing) for the use of arabic and other rtl writing systems (see https://github.com/telota/ediarum).
>>
>> Best, Markus
>>
>>> Am 03.02.2017 um 09:46 schrieb Frederik Elwert <[hidden email]>:
>>>
>>> Dear David,
>>>
>>> I helped a colleague set up Oxygen for editing an Arabic manuscript. It
>>> works quite okay. Indeed, the Author view is more suited for substantial
>>> work on the text, but I guess this is an inherent problem of XML for RTL
>>> languages: The tags themselves run LTR, the text in between in the other
>>> direction, so the cursor swaps direction all the time, making the
>>> behaviour slightly unpredictable all the time.
>>>
>>> Best,
>>> Frederik
>>>
>>>
>>>
>>>> Am 02.02.2017 um 22:34 schrieb Birnbaum, David J:
>>>> Dear TEI-L,
>>>>
>>>> I don't work with Arabic texts myself, but some of my students and
>>>> colleagues do, and one has just asked me to recommend an XML editor. I use
>>>> Oxygen for all of my own work, and I've watched the video that SyncroSoft
>>>> produced about editing Arabic in the Oxygen Author view, but before I
>>>> point my colleague in that and only that direction, I wanted to ask what
>>>> others have used to edit Arabic TEI documents. When I've poked at Arabic
>>>> in Oxygen before (older versions, though, so the following may no longer
>>>> be the case), the Text view sometimes stranded angle brackets in the wrong
>>>> place. The Author view obviously (no angle brackets) didn't do that, but
>>>> it broke onto new lines in places that made sense from an engineering
>>>> perspective, but that made the continuous text harder to read. All in all,
>>>> it was usable, but should my colleague also be considering alternatives?
>>>>
>>>> Thanks,
>>>>
>>>> David
>>>>
>>>
>>> --
>>> Dr. Frederik Elwert
>>>
>>> Digital Humanities Coordinator
>>> Center for Religious Studies
>>> Ruhr-University Bochum
>>>
>>> Universitätsstr. 90a
>>> D-44780 Bochum
>>>
>>> Phone +49(0)234 32-23024
>>>
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 10:53:07 +0200
>> From:    Radu Coravu <[hidden email]>
>> Subject: Re: Editing Arabic TEI
>>
>> Hi,
>>
>> As a developer working for Oxygen XML Editor I fully agree with
>> Frederik's analysis. The Author visual editing mode should be more
>> comfortable for RTL editing.
>>
>> About David's remark:
>>
>>> The Author view obviously (no angle brackets) didn't do that, but
>>> it broke onto new lines in places that made sense from an engineering
>>> perspective, but that made the continuous text harder to read.
>>
>> If you have some sample TEI documents and give us some hints about what
>> does not work as expected we could try to improve the behavior in a
>> future version. Unfortunately we do not use RTL writing ourselves so
>> sometimes it's hard for us to understand what the expected editing
>> behaviors should be, that's why we need help with this.
>>
>> Regards,
>> Radu
>>
>> Radu Coravu
>> <oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
>> http://www.oxygenxml.com
>>
>>> On 2/3/2017 10:46 AM, Frederik Elwert wrote:
>>> Dear David,
>>>
>>> I helped a colleague set up Oxygen for editing an Arabic manuscript. It
>>> works quite okay. Indeed, the Author view is more suited for substantial
>>> work on the text, but I guess this is an inherent problem of XML for RTL
>>> languages: The tags themselves run LTR, the text in between in the other
>>> direction, so the cursor swaps direction all the time, making the
>>> behaviour slightly unpredictable all the time.
>>>
>>> Best,
>>> Frederik
>>>
>>>
>>>
>>>> Am 02.02.2017 um 22:34 schrieb Birnbaum, David J:
>>>> Dear TEI-L,
>>>>
>>>> I don't work with Arabic texts myself, but some of my students and
>>>> colleagues do, and one has just asked me to recommend an XML editor. I use
>>>> Oxygen for all of my own work, and I've watched the video that SyncroSoft
>>>> produced about editing Arabic in the Oxygen Author view, but before I
>>>> point my colleague in that and only that direction, I wanted to ask what
>>>> others have used to edit Arabic TEI documents. When I've poked at Arabic
>>>> in Oxygen before (older versions, though, so the following may no longer
>>>> be the case), the Text view sometimes stranded angle brackets in the wrong
>>>> place. The Author view obviously (no angle brackets) didn't do that, but
>>>> it broke onto new lines in places that made sense from an engineering
>>>> perspective, but that made the continuous text harder to read. All in all,
>>>> it was usable, but should my colleague also be considering alternatives?
>>>>
>>>> Thanks,
>>>>
>>>> David
>>>>
>>>
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 10:14:27 +0100
>> From:    Gioele Barabucci <[hidden email]>
>> Subject: Re: Editing Arabic TEI
>>
>>> Am 03.02.2017 um 09:53 schrieb Radu Coravu:
>>> If you have some sample TEI documents and give us some hints about what
>>> does not work as expected we could try to improve the behavior in a
>>> future version. Unfortunately we do not use RTL writing ourselves so
>>> sometimes it's hard for us to understand what the expected editing
>>> behaviors should be, that's why we need help with this.
>>
>> Dear Radu, dear participants,
>>
>> we of the Averroes project (Uni of Cologne, DARE, CCeH) [1] have plenty
>> of material I can send you to illustrate the "ergonomic" problems that
>> editors are facing when using oXygen to edit Arabic but also Hebrew
>> texts. I'll contact you privately.
>>
>> Mostly it has to do with a clash of expectations between what happens
>> when letters are typed and how things appear on the screen, for example
>> when Latin characters (tags or punctuation marks) and Arabic characters
>> are on the same line.
>>
>> A concrete example. Suppose that A, B and C are Arabic letters and | is
>> the cursor. If you type "<line>", then A, then B then C, you get the
>> following result (correct)
>>
>> <line>|CBA
>>
>> If, at that point, you type a period, you will get the incorrect
>>
>> <line>|CBA.
>>
>> instead of correct version
>>
>> <line>|.CBA
>>
>> I used "incorrect", but the behaviour is not really incorrect. As David
>> said, one can see the engineering reasons behind it, but the editors are
>> used to other word processing applications and the behaviour of oXygen
>> just feels wrong to them.
>>
>> This is just an example. There are plenty of more complicated cases I
>> can illustrate. Solving them would improve the quality of life of the
>> editors and transcribers very much. ;)
>>
>> Regards,
>>
>> [1] http://averroes.uni-koeln.de/
>>
>> --
>> Gioele Barabucci <[hidden email]>
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 09:18:34 +0000
>> From:    Susan Schreibman <[hidden email]>
>> Subject: visiting fellows, Maynooth University
>>
>> *Visiting Fellowship Scheme*
>>
>> *Faculty of Arts, Celtic Studies and Philosophy*
>>
>> *Maynooth University*
>>
>> *Academic Year 2017-18*
>>
>> **
>>
>> The Faculty of Arts, Celtic Studies and Philosophy and /An Foras Feasa
>> /Research Institute at Maynooth University are pleased to announce the
>> call for applications for the Visiting Fellowship Scheme in the
>> Humanities for the academic year 2017-18. We are especially interested
>> in applications from colleagues in the TEI community.
>>
>> The duration of the visiting fellowship is envisaged as ordinarily
>> between one and six months; applications for a shorter or longer
>> duration will be considered. Preference will be given to Fellows whose
>> residence coincides when students are in term and Fellows from outside
>> Ireland. Only in exceptional circumstances will Fellowships be awarded
>> to researchers normally resident in Ireland.
>>
>> Fellows will receive office space and office facilities from /An Foras
>> Feasa /in the Iontas Building, a state-of-the-art humanities research
>> institute, along with full library access and computer facilities. There
>> is a robust and welcoming research culture at Maynooth University and
>> Fellows will be facilitated in achieving their research goals while in
>> residence.
>>
>> Fellows will be asked to provide one seminar or a workshop to
>> postgraduate students in the Researcher’s field of interest, as well as
>> a guest lecture to the University community. A limited number of travel
>> stipends of €500 will be available; preference will be given to
>> applicants with limited institutional funding.
>>
>> The current call will close on *31 March 2017*. Thereafter applications
>> will be considered on a rolling basis. To apply, please complete the
>> form  available here:
>> https://www.maynoothuniversity.ie/foras-feasa/visiting-fellowships
>>
>> and return it to [hidden email] <mailto:[hidden email]>.
>>
>> For informal queries, please contact Professor Susan Schreibman,
>> Director of An Foras Feasa ([hidden email])
>>
>> Note: The fellowship does not include accommodation. However, short-stay
>> accommodation may be booked through Maynooth Campus Conference and
>> Accommodation (see http://www.maynoothcampus.com) at very reasonable
>> rates. Alternatively, for longer stays, a variety of accommodation is
>> available in the Maynooth vicinity.
>>
>>
>> --
>> Susan Schreibman
>> Professor of Digital Humanities
>> Director of An Foras Feasa
>> Iontas Building
>> Maynooth University
>> Maynooth, Co. Kildare
>>
>> email: [hidden email]
>> phone: +353 1 708 3451
>> fax:  +353 1 708 4797
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 10:26:13 +0100
>> From:    Gioele Barabucci <[hidden email]>
>> Subject: Re: no worries
>>
>>> Am 03.02.2017 um 00:39 schrieb Paterson, Duncan:
>>> The other problem relates to choosing utf8 or utf16 encoding, but I
>>> don't think double byte characters applies to Arabic.
>>
>> Allow me a technical consideration and advice. UTF-16 is only a useless
>> historical artefact. It should not be used in any new project. The only
>> sane choices are UTF-8 and (if really needed) UCS-4.
>>
>> UTF-16 is the worst of all encodings: it wastes bits like UCS-2/4, is
>> incompatible with ASCII and is as computationally hard to work with as
>> UTF-8.
>>
>> One should use either UTF-8 for ASCII compatibility and space savings or
>> UCS-4 for speed of computation (but only under certain particular
>> circumstances).
>>
>> Regards,
>>
>> --
>> Gioele Barabucci <[hidden email]>
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 10:33:13 +0100
>> From:    Torsten Schassan <[hidden email]>
>> Subject: Re: locus from line 10 of folio 1r to line 5 of folio 3v
>>
>> Dear Pietro,
>>
>> the attribute @n doesn't have the semantic you need: It is just a number
>> but not necessarily (and not predictable!) the one of a line.
>>
>> A solution I can think of would be to define the line either
>>
>> - as textual line as part of the "transcription" somewhere in <text> or
>> <sourceDoc> and refer to that it from your <locus>, or
>>
>> - as zone in some graphical element (surface, graphic) and point to that
>> using @facs.
>>
>> I don't think there's a defined way of including this information in
>> @from/@to of the locus element, although it should be possible to think
>> of the line as a fragment of the page, and thus refer to that fragment
>> using the XPointer way: uri#fragmentIdentifier, here e.g. 1r#l10. Still,
>> this way would require to identify the fragment somewhere by an @xml:id.
>>
>>
>> Best, Torsten
>>
>>
>>> Am 02.02.2017 um 23:06 schrieb Pietro Liuzzo:
>>> Dear all,
>>>
>>> what is the best way to define the location of a text which starts at a
>>> certain line of a folio and ends at a certain line of another like from
>>> line 10 of folio 1r to line 5 of folio 3v
>>>
>>> <locus from="1r" n="10"/> <locus to="3v" n="5"/> for example?
>>>
>>> thanks a lot!
>>> Pietro
>>>
>>
>>
>> --
>> Torsten Schassan - Digitale Editionen, Abteilung Handschriften und
>> Sondersammlungen
>> Herzog August Bibliothek, Postfach 1364, D-38299 Wolfenbuettel, Tel.:
>> +49-5331-808-130 (Fax -165)
>> Handschriftendatenbank* http://diglib.hab.de/?db=mss
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 13:01:59 +0100
>> From:    Piotr Bański <[hidden email]>
>> Subject: failing to locate the minutes of recent Board meetings
>>
>> Dear Board,
>>
>> I am not able to find the minutes of the meeting ("Business meeting"
>> and/or Board F2F meeting) that was/were held at the last TEI-MM. I have
>> looked for them in the official TEI-C space and in the wiki. I was sure
>> I saw a message from Michelle about the minutes and just put it aside
>> for later, but now I'm beginning to think that this could have been a
>> flashback from after Lyon. My perception of time has gotten very weird
>> over the years.
>>
>> If the minutes are indeed missing, please treat this message as a
>> friendly nudge to get them done at some point :-) And if they are out
>> there and it's just me who fails at locating them, I'll be grateful for
>> the link. I thought that I could use the minutes to solve on my own the
>> issue of the potential mistake in the dates published for the upcoming
>> TEI-MM -- I would love to have the proper dates in my calendar because
>> of the other conference events taking place in the 2nd part of 2017 that
>> I should plan ahead for.
>>
>> Thanks in advance and best regards,
>>
>>   Piotr
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 13:25:17 +0100
>> From:    Franz Fischer <[hidden email]>
>> Subject: RIDE 5 out!
>>
>> Dear TEI community,
>>
>> I am very happy to announce that we just published issue 5 of RIDE, the
>> review journal for digital editions and resources.
>>
>> As in the previous three issues, we have 5 reviews (all in English) that
>> critically assess scholarly digital editions. For your convenience, this
>> is the table of contents:
>>
>> • Jane Austen’s Fiction Manuscripts, by Michelle Levy
>> • Literary drafts, genetic criticism and computational technology. The
>> Beckett Digital Manuscript Project, by Anna-Maria Sichani
>> • Lope de Vega's La Dama Boba. Critical edition and digital archivek, by
>> Antonia Rojas Castro
>> • The 1641 Depositions, by Walter Scholger
>> • The William Blake Archive, by Kendal Crawford and Michelle Levy
>>
>> All reviews can be accessed for free via our webpage: http://ride.i-d-e.de
>>
>> Enjoy the ride!
>>
>> Franz
>>
>>
>> --
>> Dr. Franz Fischer
>> Cologne Center for eHumanities
>> Universität zu Köln, Universitätsstr. 22, D-50923 Köln
>> +49 - (0)221 - 470 - 4056
>> [hidden email]
>> @vranzvischer
>>
>> cceh.uni-koeln.de, dixit.uni-koeln.de
>> i-d-e.de, ride.i-d-e.de
>> digitalmedievalist.org, digitalmedievalist.org/journal
>> guillelmus.uni-koeln.de, confessio.ie
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 13:02:21 +0000
>> From:    "Dalmau, Michelle Denise" <[hidden email]>
>> Subject: Re: failing to locate the minutes of recent Board meetings
>>
>> Hi, Piotr,
>>
>> The business meeting minutes will be posted in the next few days. I plan to summarize them and announce the 2017 officers!
>>
>> Thanks for the nudge,
>> --Michelle
>>
>> Typos and Autocorrections Courtesy of my iPhone
>>
>>> On Feb 3, 2017, at 7:02 AM, Piotr Bański <[hidden email]> wrote:
>>>
>>> Dear Board,
>>>
>>> I am not able to find the minutes of the meeting ("Business meeting" and/or Board F2F meeting) that was/were held at the last TEI-MM. I have looked for them in the official TEI-C space and in the wiki. I was sure I saw a message from Michelle about the minutes and just put it aside for later, but now I'm beginning to think that this could have been a flashback from after Lyon. My perception of time has gotten very weird over the years.
>>>
>>> If the minutes are indeed missing, please treat this message as a friendly nudge to get them done at some point :-) And if they are out there and it's just me who fails at locating them, I'll be grateful for the link. I thought that I could use the minutes to solve on my own the issue of the potential mistake in the dates published for the upcoming TEI-MM -- I would love to have the proper dates in my calendar because of the other conference events taking place in the 2nd part of 2017 that I should plan ahead for.
>>>
>>> Thanks in advance and best regards,
>>>
>>> Piotr
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 14:30:14 +0100
>> From:    Franz Fischer <[hidden email]>
>> Subject: Re: locus from line 10 of folio 1r to line 5 of folio 3v
>>
>> If you don't want to point and link to the exact location on the page
>> (and if you want to avoid too much encoding) a simple solution (yet not
>> as powerful as Torsten's suggestion) could be as follows:
>> <locus from="1r" to="3v">fol. 1r, l. 10 to fol. 3v, l.5</locus>
>>
>> If you prefer to store all information in the markup - not sure about
>> something like this:
>> <locus from="1r_10" to="3v_5"/>
>>
>> Franz
>>
>>> Am 03.02.2017 um 10:33 schrieb Torsten Schassan:
>>> Dear Pietro,
>>>
>>> the attribute @n doesn't have the semantic you need: It is just a
>>> number but not necessarily (and not predictable!) the one of a line.
>>>
>>> A solution I can think of would be to define the line either
>>>
>>> - as textual line as part of the "transcription" somewhere in <text>
>>> or <sourceDoc> and refer to that it from your <locus>, or
>>>
>>> - as zone in some graphical element (surface, graphic) and point to
>>> that using @facs.
>>>
>>> I don't think there's a defined way of including this information in
>>> @from/@to of the locus element, although it should be possible to
>>> think of the line as a fragment of the page, and thus refer to that
>>> fragment using the XPointer way: uri#fragmentIdentifier, here e.g.
>>> 1r#l10. Still, this way would require to identify the fragment
>>> somewhere by an @xml:id.
>>>
>>>
>>> Best, Torsten
>>>
>>>
>>>> Am 02.02.2017 um 23:06 schrieb Pietro Liuzzo:
>>>> Dear all,
>>>>
>>>> what is the best way to define the location of a text which starts at a
>>>> certain line of a folio and ends at a certain line of another like from
>>>> line 10 of folio 1r to line 5 of folio 3v
>>>>
>>>> <locus from="1r" n="10"/> <locus to="3v" n="5"/> for example?
>>>>
>>>> thanks a lot!
>>>> Pietro
>>>>
>>>
>>>
>>
>>
>> --
>> Dr. Franz Fischer
>> Cologne Center for eHumanities
>> Universität zu Köln, Universitätsstr. 22, D-50923 Köln
>> +49 - (0)221 - 470 - 4056
>> [hidden email]
>> @vranzvischer
>>
>> cceh.uni-koeln.de, dixit.uni-koeln.de
>> i-d-e.de, ride.i-d-e.de
>> digitalmedievalist.org, digitalmedievalist.org/journal
>> guillelmus.uni-koeln.de, confessio.ie
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 09:00:07 -0500
>> From:    Syd Bauman <[hidden email]>
>> Subject: XML character encoding (was "Re: [TEI-L] no worries")
>>
>> The sentiment, if a bit overstated, is correct IMHO. Another
>> disadvantage is an XML file encoded in UTF-16 must begin with a byte
>> order mark (U+FEFF). Does the operating system handle that? Does the
>> XML editor? Do I?
>>
>> The full name for UCS-4 in an XML declaration is "ISO-10646-UCS-4",
>> and this is one of the few places where XML is case insensitive. (So
>> a processor should recognize "ISO-10646-ucs-4" just as well.)
>>
>> All that said, I don't know how to get my operating system to read &
>> write UCS-4 (or even UTF-16, not that I care), so I always use UTF-8.
>> :-|
>>
>>> Allow me a technical consideration and advice. UTF-16 is only a
>>> useless historical artefact. It should not be used in any new
>>> project. The only sane choices are UTF-8 and (if really needed)
>>> UCS-4.
>>>
>>> UTF-16 is the worst of all encodings: it wastes bits like UCS-2/4,
>>> is incompatible with ASCII and is as computationally hard to work
>>> with as UTF-8.
>>>
>>> One should use either UTF-8 for ASCII compatibility and space
>>> savings or UCS-4 for speed of computation (but only under certain
>>> particular circumstances).
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 17:05:05 +0100
>> From:    Emmanuel NGUE UM <[hidden email]>
>> Subject: Re: standardizing linguistic encoding
>>
>> Hi,
>>
>> I am an African-Based linguist, and I have been following much of the
>> discussions going on over TEI mailing list.
>>
>> I am not a TEI practitioner per se, but I am aware of the many application
>> scenarios of this technology, including text corpora building.
>>
>> A couple of months ago, I sent an e-mail around via TEI mailing list asking
>> whether anyone knew of any TEI based/inspired framework for the encoding of
>> prosodic phenomena such as tones, especially in African tone languages. I
>> got one or two responses from members. Unfortunately these responses did
>> not address my specific concern.
>>
>> I wish to join on-going discussions about 'standardizing linguistic
>> encoding', to bring to the fore of TEI standards development, the issue of
>> "tone encoding".
>>
>> For the sake of clarification and given that not every one is necessarily
>> an expert in tone languages, let me explain by examples what tone is in
>> African tone languages.
>>
>> Given the followings tokens from Basaa, a bantu language spoken in Cameroon:
>>
>> (1) hól : to sharpen
>>
>> (2) hòl : to pay the dawry
>>
>> (3) hôl (as in *á hôl*): let him sharpen
>>
>> (4)hŏl : pay the dawry! (imperative)
>>
>>
>>
>> 2017-02-02 14:03 GMT+01:00 Eduard Drenth <[hidden email]>:
>>
>>> Thanks for your response! Standard in my case means practical, usable way
>>> for encoding linguistic information in corpora using TEI.
>>>
>>> Indeed the theme is covered by https://github.com/LingSIG/
>>> wordAttributes/wiki. Good to know of this http://wiki.tei-c.org/index.
>>> php/SIG:TEI_for_Linguists as well.
>>>
>>> We choose to continue along the choosen path, it doesn't deviate too much
>>> from uncustomized TEI, offers good support for editing and querying,
>>> satisfies our linguists, adheres to http://universaldependencies.org and
>>> is easy to convert to the 'real standard' when it is released.
>>>
>>> Perhaps our approach can be useful input for https://github.com/LingSIG/
>>> wordAttributes, it is the result of quite extensive testing and
>>> discussing.
>>>
>>> Eduard Drenth, Software Architekt
>>>
>>> [hidden email]
>>>
>>> Doelestrjitte 8
>>> 8911 DX  Ljouwert
>>> +31 58 234 30 47
>>>
>>> gpg: https://sks-keyservers.net/pks/lookup?op=get&search=
>>> 0x065EF82A1E02CC43
>>>
>>> ________________________________________
>>> From: Piotr Bański <[hidden email]>
>>> Sent: Thursday, February 2, 2017 12:24 PM
>>> To: Eduard Drenth; [hidden email]; Phillip Ströbel
>>> Subject: Re: standardizing linguistic encoding
>>>
>>> Dear Eduard, [also addressing Philip and actually all... ]
>>>
>>> It's probably my conditioning as a member of various standardization
>>> bodies that makes red lights flash in my head upon reading that you are
>>> "developing a standard"... :-) I believe that standards are better
>>> "developed" (or, more precisely, codified) on the basis of existing best
>>> practices or other existing standards. As it is, I can observe that your
>>> proposed encoding mixes up the level of tokens with the level of word
>>> forms (in ISO MAF terminology[1]), and while it can be suitable for your
>>> purposes, it is far from optimal in standardization terms.
>>>
>>> [at this point, the camera pans out]
>>>
>>> This year promises to be quite exciting for the TEI Linguistics SIG[2],
>>> given that:
>>>
>>> (1) ISO LMF [3] is up for renewal and restructuring, and that several
>>> teams (among others, from ENeL, PARTHENOS, CLARIN, and LingSIG) are
>>> currently working on various modules for it,
>>>
>>> (2) ISO Tiger [4] is nearing publication (as in: weeks rather than
>>> months) and opening a way for ISO TEIger, a TEI serialization of the ISO
>>> model for syntactic encoding,
>>>
>>> (3) there is a rising push for streamlining inline linguistic markup,
>>> coming from, among others, Martin Mueller's Early Print Project, BBAW's
>>> existing practice (presented by Susanne Haaf at various TEI meetings),
>>> the Ancient Greek Dependency Treebank (represented in this mailing list
>>> by Giuseppe Celano, I believe), and now we learn of Philip Ströbel's
>>> project and yours. And there are others. A tiny reflex of that is
>>> contained at the LingSIG GitHub space [5], which is only meant as a
>>> _seed_ for collaborative effort rather than any personal statement.
>>>
>>> Andreas Witt and I are thinking of how to address and channel this
>>> boiling mass of initiatives. One possibility could be to target the
>>> upcoming TEI Members Meeting[6] and have a focused pre-conference
>>> workshop designed to formulate a very precise and very concrete proposal
>>> for grammatical encoding synchronized across inline, standoff and
>>> dictionary markup, a proposal that we could submit to the TEI Technical
>>> Council at the end of the day. "The day" seems distant, but if we want
>>> to have a serious proposal at the end of it, work should start about now.
>>>
>>> May I invite all interested parties to join the Linguistics SIG mailing
>>> list (by going to [7]) and GitHub space (by sending me, off-list, your
>>> github username), and to, well, have a go at it... :-)
>>>
>>> Best regards,
>>>
>>>   Piotr
>>>
>>> [1]:
>>> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
>>> detail.htm?csnumber=51934
>>> [2]: http://wiki.tei-c.org/index.php/SIG:TEI_for_Linguists
>>> [3]:
>>> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
>>> detail.htm?csnumber=68516
>>> [4]:
>>> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
>>> detail.htm?csnumber=62491
>>> [5]: https://github.com/LingSIG/wordAttributes/wiki
>>> [6]: http://members.tei-c.org/Events/meetings
>>>
>>>
>>>
>>>
>>>
>>>
>>>> On 02/01/17 16:51, Eduard Drenth wrote:
>>>> Dear all,
>>>>
>>>> Here in Holland we are developing a standard to encode linguistic and
>>>> lemma information for various word situations using TEI. We have been
>>>> trying several solutions (tei:fs/tei:f, tei:interp, tei:span, ...) and
>>>> finaly chose for TEI customization which gives us standard xsd
>>>> validation, editor support and a simple focused solution. For linguistic
>>>> terminology we use as much as possible http://universaldependencies.org/
>>> .
>>>>
>>>>
>>>> We are curious as to what you think, see below for details. We hope this
>>>> solution may be of use for those who want to encode linguistic
>>>> information using TEI. Also this may help standardizing linguistic
>>>> encoding in TEI.
>>>>
>>>>
>>>> If this all is worthwhile I would like to donate/publish the solution
>>>> somewhere.
>>>>
>>>>
>>>> snippet customization:
>>>>
>>>>
>>>>            <schemaSpec ident="tdb" docLang="en" prefix="tei_"
>>>> xml:lang="en">
>>>>
>>>>                ..
>>>>
>>>>                ..
>>>>
>>>>                <classSpec type="atts" ident="att.linguistics"
>>>> module="analytics">
>>>>
>>>>                    <attList>
>>>>                        <attDef ident="linguistics"
>>>> ns="http://www.fryske-akademy.org/grammar/1.0">
>>>>                            <desc>
>>>>                                documentation....
>>>>                            </desc>
>>>>                            <datatype maxOccurs="unbounded">
>>>>                                <dataRef key="teiata.enumerated"/>
>>>>                            </datatype>
>>>>                            <valList type="closed">
>>>>                                <valItem ident="Features.Abbr">
>>>>                                    <desc>Boolean feature. Is this an
>>>> abbreviation?</desc>
>>>>                                </valItem>
>>>>                                <valItem ident="Features.Poss">
>>>>                                    <desc>Boolean feature of pronouns,
>>>> determiners or adjectives. It tells whether the word is
>>> possessive.</desc>
>>>>                                </valItem>
>>>>                                <valItem ident="PronType.Prs">
>>>>                                    <desc>personal pronoun or
>>>> determiner</desc>
>>>>                                </valItem>
>>>>
>>>>                                 ..
>>>>
>>>>                                 ..
>>>>
>>>>
>>>> example word encoding:
>>>>
>>>>
>>>> <tei:w fa:linguistics="Pos.NOUN "
>>>> lemmaRef="inprogress://lemmasystem/Hollands/frik/1"
>>>> lemma="frik">Frik</tei:w>
>>>>
>>>>
>>>> example split word encoding:
>>>>
>>>>
>>>> <tei:w xml:id="staet-op-176" rendition="#split">staet</tei:w>
>>>>
>>>> <tei:w fa:linguistics="Pos.ADV "
>>>> lemmaRef="inprogress://lemmasystem/Hollands/al/3"
>>>> lemma="al">al</tei:w><tei:w>wringende</tei:w>
>>>>
>>>> <tei:w xml:id="staet-op-179" rendition="#split">op</tei:w>
>>>>
>>>> <tei:join result="w" scope="root" lemma="opstean" target="#staet-op-176
>>>> #staet-op-179" lemmaRef="inprogress://lemmasystem/Hollands/opstean/1"
>>>> fa:linguistics="th-si-pa Pos.VERB "/>
>>>>
>>>>
>>>> example word consist of more lemma's (we don't use this yet....):
>>>>
>>>>
>>>> <tei:choice>
>>>>  <tei:orig>
>>>>    <tei:w fa:linguistics=".....">aint</tei:w>
>>>>  </tei:orig>
>>>>  <tei:reg>
>>>>    <tei:w lemma="be" fa:linguistics="...">am</tei:w>
>>>>    <tei:w lemma="not" fa:linguistics="...">not</tei:w>
>>>>  </tei:reg>
>>>> </tei:choice>
>>>>
>>>>
>>>> Bye,
>>>>
>>>>
>>>> Eduard Drenth, Software Architekt
>>>>
>>>>
>>>> [hidden email]
>>>>
>>>>
>>>> Doelestrjitte 8
>>>>
>>>> 8911 DX  Ljouwert
>>>>
>>>> +31 58 234 30 47
>>>>
>>>>
>>>> gpg: https://sks-keyservers.net/pks/lookup?op=get&search=
>>> 0x065EF82A1E02CC43
>>>>
>>>
>>> --
>>> Piotr Bański, Ph.D.
>>> Senior Researcher,
>>> Institut für Deutsche Sprache,
>>> R5 6-13
>>> 68-161 Mannheim, Germany
>>>
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 17:38:23 +0100
>> From:    Emmanuel NGUE UM <[hidden email]>
>> Subject: Re: standardizing linguistic encoding
>>
>> Hi,
>>
>> I am an African-Based linguist, and I have been following much of the
>> discussions going on over TEI mailing list.
>>
>> I am not a TEI practitioner per se, but I am aware of the many application
>> scenarios of this technology, including text corpora building.
>>
>> A couple of months ago, I sent an e-mail around via TEI mailing list asking
>> whether anyone knew of any TEI based/inspired framework for the encoding of
>> prosodic phenomena such as tones, especially in African tone languages. I
>> got one or two responses from members. Unfortunately these responses did
>> not address my specific concern.
>>
>> I wish to join on-going discussions about 'standardizing linguistic
>> encoding', to bring to the fore of TEI standards development, the issue of
>> "tone encoding".
>>
>> For the sake of clarification and given that not every one is necessarily
>> an expert in tone languages, let me explain by examples what tone is in
>> African tone languages.
>>
>> Given the followings tokens from Basaa, a bantu language spoken in Cameroon:
>>
>> (1) hól : to sharpen
>>
>> (2) hòl : to pay the dawry
>>
>> (3) hôl (as in *á hôl*): let him sharpen
>>
>> (4) hŏl : pay the dawry! (imperative)
>>
>> In (1) through (4), the difference in meaning of these words is attributed
>> to the difference in relative pitch level of the syllable: "high" in (1),
>> "low" in (2), contour or two-level "low-high" in (3), contour or two-level
>> "low-high" in (4).
>>
>> While the semantics associated with tone levels in (1) and (2) is lexically
>> encoded, the ones in (3) and (4) are complemented with grammatical
>> information, namely hortative in (3) and imperative in (4), thus resulting
>> in complex (contour) tone shapes in writing.
>>
>> Tone representation in the above examples is graphical, and is meant to
>> simply anchor pitch melody; this form of representation does not inform
>> much about the semantics associated with a specific pitch level in and
>> accross words. This is so mostly because pitch 'labels' (high, low,
>> low-high, high-low) do not encode persistent meaning, but may instead
>> trigger each and array of grammatical information such as tense, mood,
>> aspect, negation, ect., depending on the context.
>>
>> I personally believe that for better processeability and representation of
>> textual information in tone langues, there is need for developping
>> unambiguous encoding framework devoid of graphical representation of tones,
>> and I believe TEI to be one possible response to this.
>>
>> Because TEI is an open standard which is meant to be tailored to the
>> specific needs of users, I think it is our responsiblity as Africanists and
>> Bantuists, to raise TEI community's awarness about accounting for the
>> specificities of the languages we are working on, when it comes to
>> standardizing linguistic encoding.
>>
>> Best
>>
>> Emmanuel Ngué Um
>> Language Archivist for ALORA
>>
>> 2017-02-02 14:03 GMT+01:00 Eduard Drenth <[hidden email]>:
>>
>>> Thanks for your response! Standard in my case means practical, usable way
>>> for encoding linguistic information in corpora using TEI.
>>>
>>> Indeed the theme is covered by https://github.com/LingSIG/
>>> wordAttributes/wiki. Good to know of this http://wiki.tei-c.org/index.
>>> php/SIG:TEI_for_Linguists as well.
>>>
>>> We choose to continue along the choosen path, it doesn't deviate too much
>>> from uncustomized TEI, offers good support for editing and querying,
>>> satisfies our linguists, adheres to http://universaldependencies.org and
>>> is easy to convert to the 'real standard' when it is released.
>>>
>>> Perhaps our approach can be useful input for https://github.com/LingSIG/
>>> wordAttributes, it is the result of quite extensive testing and
>>> discussing.
>>>
>>> Eduard Drenth, Software Architekt
>>>
>>> [hidden email]
>>>
>>> Doelestrjitte 8
>>> 8911 DX  Ljouwert
>>> +31 58 234 30 47
>>>
>>> gpg: https://sks-keyservers.net/pks/lookup?op=get&search=
>>> 0x065EF82A1E02CC43
>>>
>>> ________________________________________
>>> From: Piotr Bański <[hidden email]>
>>> Sent: Thursday, February 2, 2017 12:24 PM
>>> To: Eduard Drenth; [hidden email]; Phillip Ströbel
>>> Subject: Re: standardizing linguistic encoding
>>>
>>> Dear Eduard, [also addressing Philip and actually all... ]
>>>
>>> It's probably my conditioning as a member of various standardization
>>> bodies that makes red lights flash in my head upon reading that you are
>>> "developing a standard"... :-) I believe that standards are better
>>> "developed" (or, more precisely, codified) on the basis of existing best
>>> practices or other existing standards. As it is, I can observe that your
>>> proposed encoding mixes up the level of tokens with the level of word
>>> forms (in ISO MAF terminology[1]), and while it can be suitable for your
>>> purposes, it is far from optimal in standardization terms.
>>>
>>> [at this point, the camera pans out]
>>>
>>> This year promises to be quite exciting for the TEI Linguistics SIG[2],
>>> given that:
>>>
>>> (1) ISO LMF [3] is up for renewal and restructuring, and that several
>>> teams (among others, from ENeL, PARTHENOS, CLARIN, and LingSIG) are
>>> currently working on various modules for it,
>>>
>>> (2) ISO Tiger [4] is nearing publication (as in: weeks rather than
>>> months) and opening a way for ISO TEIger, a TEI serialization of the ISO
>>> model for syntactic encoding,
>>>
>>> (3) there is a rising push for streamlining inline linguistic markup,
>>> coming from, among others, Martin Mueller's Early Print Project, BBAW's
>>> existing practice (presented by Susanne Haaf at various TEI meetings),
>>> the Ancient Greek Dependency Treebank (represented in this mailing list
>>> by Giuseppe Celano, I believe), and now we learn of Philip Ströbel's
>>> project and yours. And there are others. A tiny reflex of that is
>>> contained at the LingSIG GitHub space [5], which is only meant as a
>>> _seed_ for collaborative effort rather than any personal statement.
>>>
>>> Andreas Witt and I are thinking of how to address and channel this
>>> boiling mass of initiatives. One possibility could be to target the
>>> upcoming TEI Members Meeting[6] and have a focused pre-conference
>>> workshop designed to formulate a very precise and very concrete proposal
>>> for grammatical encoding synchronized across inline, standoff and
>>> dictionary markup, a proposal that we could submit to the TEI Technical
>>> Council at the end of the day. "The day" seems distant, but if we want
>>> to have a serious proposal at the end of it, work should start about now.
>>>
>>> May I invite all interested parties to join the Linguistics SIG mailing
>>> list (by going to [7]) and GitHub space (by sending me, off-list, your
>>> github username), and to, well, have a go at it... :-)
>>>
>>> Best regards,
>>>
>>>   Piotr
>>>
>>> [1]:
>>> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
>>> detail.htm?csnumber=51934
>>> [2]: http://wiki.tei-c.org/index.php/SIG:TEI_for_Linguists
>>> [3]:
>>> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
>>> detail.htm?csnumber=68516
>>> [4]:
>>> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
>>> detail.htm?csnumber=62491
>>> [5]: https://github.com/LingSIG/wordAttributes/wiki
>>> [6]: http://members.tei-c.org/Events/meetings
>>>
>>>
>>>
>>>
>>>
>>>
>>>> On 02/01/17 16:51, Eduard Drenth wrote:
>>>> Dear all,
>>>>
>>>> Here in Holland we are developing a standard to encode linguistic and
>>>> lemma information for various word situations using TEI. We have been
>>>> trying several solutions (tei:fs/tei:f, tei:interp, tei:span, ...) and
>>>> finaly chose for TEI customization which gives us standard xsd
>>>> validation, editor support and a simple focused solution. For linguistic
>>>> terminology we use as much as possible http://universaldependencies.org/
>>> .
>>>>
>>>>
>>>> We are curious as to what you think, see below for details. We hope this
>>>> solution may be of use for those who want to encode linguistic
>>>> information using TEI. Also this may help standardizing linguistic
>>>> encoding in TEI.
>>>>
>>>>
>>>> If this all is worthwhile I would like to donate/publish the solution
>>>> somewhere.
>>>>
>>>>
>>>> snippet customization:
>>>>
>>>>
>>>>            <schemaSpec ident="tdb" docLang="en" prefix="tei_"
>>>> xml:lang="en">
>>>>
>>>>                ..
>>>>
>>>>                ..
>>>>
>>>>                <classSpec type="atts" ident="att.linguistics"
>>>> module="analytics">
>>>>
>>>>                    <attList>
>>>>                        <attDef ident="linguistics"
>>>> ns="http://www.fryske-akademy.org/grammar/1.0">
>>>>                            <desc>
>>>>                                documentation....
>>>>                            </desc>
>>>>                            <datatype maxOccurs="unbounded">
>>>>                                <dataRef key="teiata.enumerated"/>
>>>>                            </datatype>
>>>>                            <valList type="closed">
>>>>                                <valItem ident="Features.Abbr">
>>>>                                    <desc>Boolean feature. Is this an
>>>> abbreviation?</desc>
>>>>                                </valItem>
>>>>                                <valItem ident="Features.Poss">
>>>>                                    <desc>Boolean feature of pronouns,
>>>> determiners or adjectives. It tells whether the word is
>>> possessive.</desc>
>>>>                                </valItem>
>>>>                                <valItem ident="PronType.Prs">
>>>>                                    <desc>personal pronoun or
>>>> determiner</desc>
>>>>                                </valItem>
>>>>
>>>>                                 ..
>>>>
>>>>                                 ..
>>>>
>>>>
>>>> example word encoding:
>>>>
>>>>
>>>> <tei:w fa:linguistics="Pos.NOUN "
>>>> lemmaRef="inprogress://lemmasystem/Hollands/frik/1"
>>>> lemma="frik">Frik</tei:w>
>>>>
>>>>
>>>> example split word encoding:
>>>>
>>>>
>>>> <tei:w xml:id="staet-op-176" rendition="#split">staet</tei:w>
>>>>
>>>> <tei:w fa:linguistics="Pos.ADV "
>>>> lemmaRef="inprogress://lemmasystem/Hollands/al/3"
>>>> lemma="al">al</tei:w><tei:w>wringende</tei:w>
>>>>
>>>> <tei:w xml:id="staet-op-179" rendition="#split">op</tei:w>
>>>>
>>>> <tei:join result="w" scope="root" lemma="opstean" target="#staet-op-176
>>>> #staet-op-179" lemmaRef="inprogress://lemmasystem/Hollands/opstean/1"
>>>> fa:linguistics="th-si-pa Pos.VERB "/>
>>>>
>>>>
>>>> example word consist of more lemma's (we don't use this yet....):
>>>>
>>>>
>>>> <tei:choice>
>>>>  <tei:orig>
>>>>    <tei:w fa:linguistics=".....">aint</tei:w>
>>>>  </tei:orig>
>>>>  <tei:reg>
>>>>    <tei:w lemma="be" fa:linguistics="...">am</tei:w>
>>>>    <tei:w lemma="not" fa:linguistics="...">not</tei:w>
>>>>  </tei:reg>
>>>> </tei:choice>
>>>>
>>>>
>>>> Bye,
>>>>
>>>>
>>>> Eduard Drenth, Software Architekt
>>>>
>>>>
>>>> [hidden email]
>>>>
>>>>
>>>> Doelestrjitte 8
>>>>
>>>> 8911 DX  Ljouwert
>>>>
>>>> +31 58 234 30 47
>>>>
>>>>
>>>> gpg: https://sks-keyservers.net/pks/lookup?op=get&search=
>>> 0x065EF82A1E02CC43
>>>>
>>>
>>> --
>>> Piotr Bański, Ph.D.
>>> Senior Researcher,
>>> Institut für Deutsche Sprache,
>>> R5 6-13
>>> 68-161 Mannheim, Germany
>>>
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 18:12:48 +0100
>> From:    Piotr Bański <[hidden email]>
>> Subject: Re: standardizing linguistic encoding
>>
>> Hi Emmanuel,
>>
>> It's great to hear from you. You may be pleased to hear about a set of
>> tools that can be used to encode the information you need very
>> precisely, and to attach that information to objects of any granularity
>> ("standard" tokens, morphs, phrases) and any sort (orthographic,
>> prosodic, morphological). These tools have been defined jointly by the
>> TEI and ISO, and the TEI description is free and well-tested, and you
>> can read about it in the chapter on feature structures:
>>
>> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/FS.html
>>
>> (I'd suggest skipping 18.11 when reading this chapter for the first time)
>>
>> These tools make it possible for you to describe practically any feature
>> matrices needed in linguistics. And the TEI has mechanisms for attaching
>> them to linguistic/textual objects.
>>
>> If you'd like to pursue this further, you may be interested in joining
>> the TEI Linguistics SIG mailing list at
>>
>> https://listserv.brown.edu/archives/cgi-bin/wa?A0=TEI-LINGUISTICS
>>
>> where we have just talked about cases slightly more complex than the
>> kind of attribute-based markup discussed here.
>>
>> Best regards,
>>
>>   Piotr
>>
>>
>>
>>> On 02/03/17 17:38, Emmanuel NGUE UM wrote:
>>> Hi,
>>>
>>> I am an African-Based linguist, and I have been following much of the
>>> discussions going on over TEI mailing list.
>>>
>>> I am not a TEI practitioner per se, but I am aware of the many
>>> application scenarios of this technology, including text corpora building.
>>>
>>> A couple of months ago, I sent an e-mail around via TEI mailing list
>>> asking whether anyone knew of any TEI based/inspired framework for the
>>> encoding of prosodic phenomena such as tones, especially in African tone
>>> languages. I got one or two responses from members. Unfortunately these
>>> responses did not address my specific concern.
>>>
>>> I wish to join on-going discussions about 'standardizing linguistic
>>> encoding', to bring to the fore of TEI standards development, the issue
>>> of "tone encoding".
>>>
>>> For the sake of clarification and given that not every one is
>>> necessarily an expert in tone languages, let me explain by examples what
>>> tone is in African tone languages.
>>>
>>> Given the followings tokens from Basaa, a bantu language spoken in Cameroon:
>>>
>>> (1) hól : to sharpen
>>>
>>> (2) hòl : to pay the dawry
>>>
>>> (3) hôl (as in /á hôl/): let him sharpen
>>>
>>> (4) hŏl : pay the dawry! (imperative)
>>>
>>> In (1) through (4), the difference in meaning of these words is
>>> attributed to the difference in relative pitch level of the syllable:
>>> "high" in (1), "low" in (2), contour or two-level "low-high" in (3),
>>> contour or two-level "low-high" in (4).
>>>
>>> While the semantics associated with tone levels in (1) and (2) is
>>> lexically encoded, the ones in (3) and (4) are complemented with
>>> grammatical information, namely hortative in (3) and imperative in (4),
>>> thus resulting in complex (contour) tone shapes in writing.
>>>
>>> Tone representation in the above examples is graphical, and is meant to
>>> simply anchor pitch melody; this form of representation does not inform
>>> much about the semantics associated with a specific pitch level in and
>>> accross words. This is so mostly because pitch 'labels' (high, low,
>>> low-high, high-low) do not encode persistent meaning, but may instead
>>> trigger each and array of grammatical information such as tense, mood,
>>> aspect, negation, ect., depending on the context.
>>>
>>> I personally believe that for better processeability and representation
>>> of textual information in tone langues, there is need for developping
>>> unambiguous encoding framework devoid of graphical representation of
>>> tones, and I believe TEI to be one possible response to this.
>>>
>>> Because TEI is an open standard which is meant to be tailored to the
>>> specific needs of users, I think it is our responsiblity as Africanists
>>> and Bantuists, to raise TEI community's awarness about accounting for
>>> the specificities of the languages we are working on, when it comes to
>>> standardizing linguistic encoding.
>>>
>>> Best
>>>
>>> Emmanuel Ngué Um
>>> Language Archivist for ALORA
>>>
>>> 2017-02-02 14:03 GMT+01:00 Eduard Drenth <[hidden email]
>>> <mailto:[hidden email]>>:
>>>
>>>    Thanks for your response! Standard in my case means practical,
>>>    usable way for encoding linguistic information in corpora using TEI.
>>>
>>>    Indeed the theme is covered by
>>>    https://github.com/LingSIG/wordAttributes/wiki
>>>    <https://github.com/LingSIG/wordAttributes/wiki>. Good to know of
>>>    this http://wiki.tei-c.org/index.php/SIG:TEI_for_Linguists
>>>    <http://wiki.tei-c.org/index.php/SIG:TEI_for_Linguists> as well.
>>>
>>>    We choose to continue along the choosen path, it doesn't deviate too
>>>    much from uncustomized TEI, offers good support for editing and
>>>    querying, satisfies our linguists, adheres to
>>>    http://universaldependencies.org <http://universaldependencies.org>
>>>    and is easy to convert to the 'real standard' when it is released.
>>>
>>>    Perhaps our approach can be useful input for
>>>    https://github.com/LingSIG/wordAttributes
>>>    <https://github.com/LingSIG/wordAttributes>, it is the result of
>>>    quite extensive testing and discussing.
>>>
>>>    Eduard Drenth, Software Architekt
>>>
>>>    [hidden email] <mailto:[hidden email]>
>>>
>>>    Doelestrjitte 8
>>>    8911 DX  Ljouwert
>>>    +31 58 234 30 47 <tel:%2B31%2058%20234%2030%2047>
>>>
>>>    gpg:
>>>    https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43
>>>    <https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43>
>>>
>>>    ________________________________________
>>>    From: Piotr Bański <[hidden email]
>>>    <mailto:[hidden email]>>
>>>    Sent: Thursday, February 2, 2017 12:24 PM
>>>    To: Eduard Drenth; [hidden email]
>>>    <mailto:[hidden email]>; Phillip Ströbel
>>>    Subject: Re: standardizing linguistic encoding
>>>
>>>    Dear Eduard, [also addressing Philip and actually all... ]
>>>
>>>    It's probably my conditioning as a member of various standardization
>>>    bodies that makes red lights flash in my head upon reading that you are
>>>    "developing a standard"... :-) I believe that standards are better
>>>    "developed" (or, more precisely, codified) on the basis of existing best
>>>    practices or other existing standards. As it is, I can observe that your
>>>    proposed encoding mixes up the level of tokens with the level of word
>>>    forms (in ISO MAF terminology[1]), and while it can be suitable for your
>>>    purposes, it is far from optimal in standardization terms.
>>>
>>>    [at this point, the camera pans out]
>>>
>>>    This year promises to be quite exciting for the TEI Linguistics SIG[2],
>>>    given that:
>>>
>>>    (1) ISO LMF [3] is up for renewal and restructuring, and that several
>>>    teams (among others, from ENeL, PARTHENOS, CLARIN, and LingSIG) are
>>>    currently working on various modules for it,
>>>
>>>    (2) ISO Tiger [4] is nearing publication (as in: weeks rather than
>>>    months) and opening a way for ISO TEIger, a TEI serialization of the ISO
>>>    model for syntactic encoding,
>>>
>>>    (3) there is a rising push for streamlining inline linguistic markup,
>>>    coming from, among others, Martin Mueller's Early Print Project, BBAW's
>>>    existing practice (presented by Susanne Haaf at various TEI meetings),
>>>    the Ancient Greek Dependency Treebank (represented in this mailing list
>>>    by Giuseppe Celano, I believe), and now we learn of Philip Ströbel's
>>>    project and yours. And there are others. A tiny reflex of that is
>>>    contained at the LingSIG GitHub space [5], which is only meant as a
>>>    _seed_ for collaborative effort rather than any personal statement.
>>>
>>>    Andreas Witt and I are thinking of how to address and channel this
>>>    boiling mass of initiatives. One possibility could be to target the
>>>    upcoming TEI Members Meeting[6] and have a focused pre-conference
>>>    workshop designed to formulate a very precise and very concrete proposal
>>>    for grammatical encoding synchronized across inline, standoff and
>>>    dictionary markup, a proposal that we could submit to the TEI Technical
>>>    Council at the end of the day. "The day" seems distant, but if we want
>>>    to have a serious proposal at the end of it, work should start about
>>>    now.
>>>
>>>    May I invite all interested parties to join the Linguistics SIG mailing
>>>    list (by going to [7]) and GitHub space (by sending me, off-list, your
>>>    github username), and to, well, have a go at it... :-)
>>>
>>>    Best regards,
>>>
>>>       Piotr
>>>
>>>    [1]:
>>>    http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=51934
>>>    <http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=51934>
>>>    [2]: http://wiki.tei-c.org/index.php/SIG:TEI_for_Linguists
>>>    <http://wiki.tei-c.org/index.php/SIG:TEI_for_Linguists>
>>>    [3]:
>>>    http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=68516
>>>    <http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=68516>
>>>    [4]:
>>>    http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=62491
>>>    <http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=62491>
>>>    [5]: https://github.com/LingSIG/wordAttributes/wiki
>>>    <https://github.com/LingSIG/wordAttributes/wiki>
>>>    [6]: http://members.tei-c.org/Events/meetings
>>>    <http://members.tei-c.org/Events/meetings>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>    On 02/01/17 16:51, Eduard Drenth wrote:
>>>> Dear all,
>>>>
>>>> Here in Holland we are developing a standard to encode linguistic and
>>>> lemma information for various word situations using TEI. We have been
>>>> trying several solutions (tei:fs/tei:f, tei:interp, tei:span, ...) and
>>>> finaly chose for TEI customization which gives us standard xsd
>>>> validation, editor support and a simple focused solution. For
>>>    linguistic
>>>> terminology we use as much as possible
>>>    http://universaldependencies.org/ <http://universaldependencies.org/>.
>>>>
>>>>
>>>> We are curious as to what you think, see below for details. We
>>>    hope this
>>>> solution may be of use for those who want to encode linguistic
>>>> information using TEI. Also this may help standardizing linguistic
>>>> encoding in TEI.
>>>>
>>>>
>>>> If this all is worthwhile I would like to donate/publish the solution
>>>> somewhere.
>>>>
>>>>
>>>> snippet customization:
>>>>
>>>>
>>>>            <schemaSpec ident="tdb" docLang="en" prefix="tei_"
>>>> xml:lang="en">
>>>>
>>>>                ..
>>>>
>>>>                ..
>>>>
>>>>                <classSpec type="atts" ident="att.linguistics"
>>>> module="analytics">
>>>>
>>>>                    <attList>
>>>>                        <attDef ident="linguistics"
>>>> ns="http://www.fryske-akademy.org/grammar/1.0
>>>    <http://www.fryske-akademy.org/grammar/1.0>">
>>>>                            <desc>
>>>>                                documentation....
>>>>                            </desc>
>>>>                            <datatype maxOccurs="unbounded">
>>>>                                <dataRef key="teiata.enumerated"/>
>>>>                            </datatype>
>>>>                            <valList type="closed">
>>>>                                <valItem ident="Features.Abbr">
>>>>                                    <desc>Boolean feature. Is this an
>>>> abbreviation?</desc>
>>>>                                </valItem>
>>>>                                <valItem ident="Features.Poss">
>>>>                                    <desc>Boolean feature of pronouns,
>>>> determiners or adjectives. It tells whether the word is
>>>    possessive.</desc>
>>>>                                </valItem>
>>>>                                <valItem ident="PronType.Prs">
>>>>                                    <desc>personal pronoun or
>>>> determiner</desc>
>>>>                                </valItem>
>>>>
>>>>                                 ..
>>>>
>>>>                                 ..
>>>>
>>>>
>>>> example word encoding:
>>>>
>>>>
>>>> <tei:w fa:linguistics="Pos.NOUN "
>>>> lemmaRef="inprogress://lemmasystem/Hollands/frik/1"
>>>> lemma="frik">Frik</tei:w>
>>>>
>>>>
>>>> example split word encoding:
>>>>
>>>>
>>>> <tei:w xml:id="staet-op-176" rendition="#split">staet</tei:w>
>>>>
>>>> <tei:w fa:linguistics="Pos.ADV "
>>>> lemmaRef="inprogress://lemmasystem/Hollands/al/3"
>>>> lemma="al">al</tei:w><tei:w>wringende</tei:w>
>>>>
>>>> <tei:w xml:id="staet-op-179" rendition="#split">op</tei:w>
>>>>
>>>> <tei:join result="w" scope="root" lemma="opstean"
>>>    target="#staet-op-176
>>>> #staet-op-179" lemmaRef="inprogress://lemmasystem/Hollands/opstean/1"
>>>> fa:linguistics="th-si-pa Pos.VERB "/>
>>>>
>>>>
>>>> example word consist of more lemma's (we don't use this yet....):
>>>>
>>>>
>>>> <tei:choice>
>>>>  <tei:orig>
>>>>    <tei:w fa:linguistics=".....">aint</tei:w>
>>>>  </tei:orig>
>>>>  <tei:reg>
>>>>    <tei:w lemma="be" fa:linguistics="...">am</tei:w>
>>>>    <tei:w lemma="not" fa:linguistics="...">not</tei:w>
>>>>  </tei:reg>
>>>> </tei:choice>
>>>>
>>>>
>>>> Bye,
>>>>
>>>>
>>>> Eduard Drenth, Software Architekt
>>>>
>>>>
>>>> [hidden email] <mailto:[hidden email]>
>>>>
>>>>
>>>> Doelestrjitte 8
>>>>
>>>> 8911 DX  Ljouwert
>>>>
>>>> +31 58 234 30 47 <tel:%2B31%2058%20234%2030%2047>
>>>>
>>>>
>>>> gpg:
>>>    https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43
>>>    <https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43>
>>>>
>>>
>>>    --
>>>    Piotr Bański, Ph.D.
>>>    Senior Researcher,
>>>    Institut für Deutsche Sprache,
>>>    R5 6-13
>>>    68-161 Mannheim, Germany
>>>
>>>
>>
>> --
>> Piotr Bański, Ph.D.
>> Senior Researcher,
>> Institut für Deutsche Sprache,
>> R5 6-13
>> 68-161 Mannheim, Germany
>>
>> ------------------------------
>>
>> Date:    Fri, 3 Feb 2017 22:05:14 +0000
>> From:    "Birnbaum, David J" <[hidden email]>
>> Subject: Re: Editing Arabic TEI
>>
>> Dear TEI-L,
>>
>> Thanks very much to all who responded to my inquiry about editing Arabic
>> in <oXygen/>, and I have passed the information along to the colleague on
>> whose behalf I was asking.
>>
>> Best,
>>
>> David
>> __
>>
>> On 2017-03-02, 4:14 AM, "TEI (Text Encoding Initiative) public discussion
>> list on behalf of Gioele Barabucci" <[hidden email] on behalf of
>> [hidden email]> wrote:
>>
>>>> Am 03.02.2017 um 09:53 schrieb Radu Coravu:
>>>> If you have some sample TEI documents and give us some hints about what
>>>> does not work as expected we could try to improve the behavior in a
>>>> future version. Unfortunately we do not use RTL writing ourselves so
>>>> sometimes it's hard for us to understand what the expected editing
>>>> behaviors should be, that's why we need help with this.
>>>
>>> Dear Radu, dear participants,
>>>
>>> we of the Averroes project (Uni of Cologne, DARE, CCeH) [1] have plenty
>>> of material I can send you to illustrate the "ergonomic" problems that
>>> editors are facing when using oXygen to edit Arabic but also Hebrew
>>> texts. I'll contact you privately.
>>>
>>> Mostly it has to do with a clash of expectations between what happens
>>> when letters are typed and how things appear on the screen, for example
>>> when Latin characters (tags or punctuation marks) and Arabic characters
>>> are on the same line.
>>>
>>> A concrete example. Suppose that A, B and C are Arabic letters and | is
>>> the cursor. If you type "<line>", then A, then B then C, you get the
>>> following result (correct)
>>>
>>> <line>|CBA
>>>
>>> If, at that point, you type a period, you will get the incorrect
>>>
>>> <line>|CBA.
>>>
>>> instead of correct version
>>>
>>> <line>|.CBA
>>>
>>> I used "incorrect", but the behaviour is not really incorrect. As David
>>> said, one can see the engineering reasons behind it, but the editors are
>>> used to other word processing applications and the behaviour of oXygen
>>> just feels wrong to them.
>>>
>>> This is just an example. There are plenty of more complicated cases I
>>> can illustrate. Solving them would improve the quality of life of the
>>> editors and transcribers very much. ;)
>>>
>>> Regards,
>>>
>>> [1]
>>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Faverroes.u
>>> ni-koeln.de%2F&data=01%7C01%7Cdjbpitt%40PITT.EDU%7C0e01e5cbfd924718641908d
>>> 44c151912%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1&sdata=d6R%2ByVXcLnMw1FvbA
>>> 7vipjIsrQgq2cXbvvY%2BKxHq1%2BQ%3D&reserved=0
>>>
>>> --
>>> Gioele Barabucci <[hidden email]>
>>
>> ------------------------------
>>
>> End of TEI-L Digest - 2 Feb 2017 to 3 Feb 2017 (#2017-28)
>> *********************************************************