Translating from under-resourced languages: comparing direct transfer against pivot translation

In this paper we compare two methods for translating into English from languages for which few MT resources have been developed (e.g. Ukrainian). The first method involves direct transfer using an MT system that is available for this language pair. The second method involves translation via a cognate language, which has more translation resources and one or more advanced translation systems (e.g. Russian for Slavonic languages). The comparison shows that it is possible to achieve better translation quality via the pivot language, leveraging on advanced dictionaries and grammars available for it and on lexical and syntactic similarities between the source and pivot languages. The results suggest that MT development efforts can be efficiently reused for families of closely related languages, and investing in MT for closely related languages can be more productive than developing systems from scratch for new translation directions. We also suggest a method for comparing the performance of a direct and pivot translation routes via automated evaluation of segments with varying translation difficulty.

[1]  Mikel L. Forcada,et al.  An Open-Source Shallow-Transfer Machine Translation Toolbox: Consequences of Its Release and Availability , 2005, MTSUMMIT.

[2]  Jan Haji – an MT system for closely related languages , 2000 .

[3]  D. Elliott,et al.  Estimating the predictive Power of N-gram MT Evaluation Metrics across Language and Text Types , 2005, MTSUMMIT.

[4]  John S. White,et al.  The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches , 1994, AMTA.

[5]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[6]  Serge Sharoff,et al.  Assisting Translators in Indirect Lexical Transfer , 2007, ACL.

[7]  Bogdan Babych,et al.  Extending MT evaluation tools with translation complexity metrics , 2004, COLING.

[8]  Serge Sharo Creating General-Purpose Corpora Using Automated Search Engine Queries , 2006 .

[9]  Bogdan Babych,et al.  Extending the BLEU MT Evaluation Method with Frequency Weightings , 2004, ACL.

[10]  W. J. Hutchins,et al.  Machine Translation: A Brief History , 1995 .

[11]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Jan Hajic,et al.  Machine Translation of Very Close Languages , 2000, ANLP.

[14]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[15]  P. Homola,et al.  Exploiting Similarity in the MT into a Minority Language , 2006 .

[16]  Juan Alberto Alonso,et al.  Machine translation for Catalan↔Spanish: the real case for productive MT , 2005, EAMT.