Text and data mining for minority languages

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Text and data mining for minority languages

Ásta Helgadóttir
Dear all,

I am looking for some examples of applications that use text and data
mining of natural languages, preferrably minority or small languages
such as Suomi, Icelandic, Galician, etc. Examples can be applications
that are to help the visual impaired of a minority language, some kind
og cultural preservation, making it easier to analyse or compare the
minority language in some way.

The reason why is that there is an opinion paper being discussed in the
European Parliament on language equality in digital age. The political
groups are skeptical of the relevance of text and data mining for
language in the digital age, and I am working on finding projects that
demonstrate its usefulness as means of development and preservation of
languages, especially smaller languages. My objective is to contact the
relevant MEPs to explain the importance of TDM for language in digital
world.

If anything comes in mind, please let me know. Also if there is any
additional information that would be of relevance I would be very grateful.

Best regards,
Ásta Helgadóttir
SPARC Europe | Copyright Policy Advisor
TW: @asta_fish | tel: +354 66 12 12 8
www.sparceurope.org
Reply | Threaded
Open this post in threaded view
|

Re: Text and data mining for minority languages

Stuart A. Yeates
Well, two of the three languages you mention are supported by Google's CLD https://github.com/CLD2Owners/cld2 . CLD makes it very easy to detect the language of items from mixed-language corpora (or mixed-language documents). It's widely integrated into software systems (chrome, mastodon, etc) and appears to be very reliable. 

cheers
stuart

--
...let us be heard from red core to black sky

On 13 April 2018 at 01:30, Ásta Helgadóttir <[hidden email]> wrote:
Dear all,

I am looking for some examples of applications that use text and data
mining of natural languages, preferrably minority or small languages
such as Suomi, Icelandic, Galician, etc. Examples can be applications
that are to help the visual impaired of a minority language, some kind
og cultural preservation, making it easier to analyse or compare the
minority language in some way.

The reason why is that there is an opinion paper being discussed in the
European Parliament on language equality in digital age. The political
groups are skeptical of the relevance of text and data mining for
language in the digital age, and I am working on finding projects that
demonstrate its usefulness as means of development and preservation of
languages, especially smaller languages. My objective is to contact the
relevant MEPs to explain the importance of TDM for language in digital
world.

If anything comes in mind, please let me know. Also if there is any
additional information that would be of relevance I would be very grateful.

Best regards,
Ásta Helgadóttir
SPARC Europe | Copyright Policy Advisor
TW: @asta_fish | tel: +354 66 12 12 8
www.sparceurope.org