Character-Based Pivot Translation for Under-Resourced Languages and Domains

In this paper we investigate the use of character-level translation models to support the translation from and to under-resourced languages and textual domains via closely related pivot languages. Our experiments show that these low-level models can be successful even with tiny amounts of training data. We test the approach on movie subtitles for three language pairs and legal texts for another language pair in a domain adaptation task. Our pivot translations outperform the baselines by a large margin.

[1]  Stig Johansson,et al.  Coding and Aligning the English-Norwegian Parallel Corpus , 1996 .

[2]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[4]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[5]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[6]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[7]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[8]  A. Gispert,et al.  Catalan-English Statistical Machine Translation without Parallel Corpus: Bridging through Spanish , 2006 .

[9]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[10]  Mirella Lapata,et al.  Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora , 2007, ACL.

[11]  Grzegorz Kondrak,et al.  Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion , 2007, NAACL.

[12]  David Matthews,et al.  Machine Transliteration of Proper Names , 2007 .

[13]  Hua Wu,et al.  Pivot language approach for phrase-based statistical machine translation , 2007, ACL.

[14]  Hermann Ney,et al.  Can We Translate Letters? , 2007, WMT@ACL.

[15]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.

[16]  Marcello Federico,et al.  Phrase-based statistical machine translation with pivot languages. , 2008, IWSLT.

[17]  Jörg Tiedemann,et al.  Character-Based PSMT for Closely Related Languages , 2009, EAMT.

[18]  M. Fishel Deeper than Words : Morph-based Alignment for Statistical Machine Translation , 2009 .

[19]  Hwee Tou Ng,et al.  Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages , 2009, EMNLP.

[20]  J. Tiedemann,et al.  Translating Transliterations , 2009 .

[21]  Hua Wu,et al.  Revisiting Pivot Language Approach for Machine Translation , 2009, ACL.

[22]  Philipp Koehn,et al.  462 Machine Translation Systems for Europe , 2009, MTSUMMIT.

[23]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[24]  Mark Fishel,et al.  Linguistically Motivated Unsupervised Segmentation for Machine Translation , 2010, LREC.

[25]  François Yvon,et al.  Local lexical adaptation in Machine Translation through triangulation: SMT helping SMT , 2010, COLING.

[26]  Preslav Nakov,et al.  A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages , 2010, EMNLP.