Investigations on Translation Model Adaptation Using Monolingual Data

Most of the freely available parallel data to train the translation model of a statistical machine translation system comes from very specific sources (European parliament, United Nations, etc). Therefore, there is increasing interest in methods to perform an adaptation of the translation model. A popular approach is based on unsupervised training, also called self-enhancing. Both only use monolingual data to adapt the translation model. In this paper we extend the previous work and provide new insight in the existing methods. We report results on the translation between French and English. Improvements of up to 0.5 BLEU were observed with respect to a very competitive baseline trained on more than 280M words of human translated parallel data.

[1]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[2]  Hermann Ney,et al.  Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[3]  Nicola Ueffing,et al.  Using monolingual source-language data to improve MT performance , 2006, IWSLT.

[4]  Holger Schwenk,et al.  LIUM SMT Machine Translation System for WMT 2010 , 2010, WMT@ACL.

[5]  Joel D. Martin,et al.  Improving Translation Quality by Discarding Most of the Phrasetable , 2007, EMNLP.

[6]  Haizhou Li,et al.  Exploiting N-best Hypotheses for SMT Self-Enhancement , 2008, ACL.

[7]  Marcello Federico,et al.  Phrase-based statistical machine translation with pivot languages. , 2008, IWSLT.

[8]  Alfons Juan-Císcar,et al.  Domain Adaptation in Statistical Machine Translation with Mixture Modelling , 2007, WMT@ACL.

[9]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[10]  Marcello Federico,et al.  Domain Adaptation for Statistical Machine Translation with Monolingual Resources , 2009, WMT@EACL.

[11]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[12]  Gholamreza Haffari,et al.  Transductive learning for statistical machine translation , 2007, ACL.

[13]  Nizar Habash,et al.  Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation , 2008, ACL.

[14]  Stephan Vogel,et al.  Language Model Adaptation for Statistical Machine Translation via Structured Query Models , 2004, COLING.

[15]  Holger Schwenk,et al.  Translation Model Adaptation for an Arabic/French News Translation System by Lightly- Supervised Training , 2009, MTSUMMIT.

[16]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[17]  Holger Schwenk,et al.  Investigations on large-scale lightly-supervised training for statistical machine translation. , 2008, IWSLT.

[18]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[19]  Ondrej Bojar,et al.  Forms Wanted : Training SMT on Monolingual Data , 2010 .