Source-Language Dictionaries Help Non-Expert Users to Enlarge Target-Language Dictionaries for Machine Translation

In this paper, a previous work on the enlargement of monolingual dictionaries of rule-based machine translation systems by non-expert users is extended to tackle the complete task of adding both source-language and target-language words to the monolingual dictionaries and the bilingual dictionary. In the original method, users validate whether some suffix variations of the word to be inserted are correct in order to find the most appropriate inflection paradigm. This method is now improved by taking advantage from the strong correlation detected between paradigms in both languages to reduce the search space of the target-language paradigm once the source-language paradigm is known. Results show that, when the source-language word has already been inserted, the system is able to more accurately predict which is the right target-language paradigm, and the number of queries posed to users is significantly reduced. Experiments also show that, when the source language and the target language are not closely related, it is only the source-language part-of-speech category, but not the rest of information provided by the source-language paradigm, which helps to correctly classify the target-language word.

[1]  Alon Lavie,et al.  Paramor: from paradigm structure to natural language morphology induction , 2008 .

[2]  Ariadna Font Llitjós,et al.  Automatic improvement of machine translation systems , 2007 .

[3]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[4]  Mikel L. Forcada,et al.  Inferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora , 2014, J. Artif. Intell. Res..

[5]  Sergei Nirenburg,et al.  Embedding Knowledge Elicitation and MT Systems within a Single Architecture , 2005, Machine Translation.

[6]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[7]  Víctor M. Sánchez-Cartagena,et al.  Enlarging Monolingual Dictionaries for Machine Translation with Active Learning and Non-Expert Users , 2011, RANLP.

[8]  Fredrik Olsson,et al.  A literature survey of active machine learning in the context of natural language processing , 2009 .

[9]  Géraldine Walther,et al.  Enriching Morphological Lexica through Unsupervised Derivational Rule Acquisition , 2011 .

[10]  Radek Sedlácek,et al.  Tools for Semi-automatic Assignment of Czech Nouns to Declination Patterns , 2002, TSD.

[11]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[12]  Jaime G. Carbonell,et al.  Active Learning and Crowd-Sourcing for Machine Translation , 2010, LREC.

[13]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[14]  Min-Yen Kan,et al.  Perspectives on crowdsourcing annotations for natural language processing , 2012, Language Resources and Evaluation.

[15]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[16]  Mikel L. Forcada,et al.  Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation , 2007, Machine Translation.

[17]  Chris Fox,et al.  The Handbook of Computational Linguistics and Natural Language Processing , 2010 .