Translation Model Adaptation for an Arabic/French News Translation System by Lightly- Supervised Training

Published in MT Summit XII, August 2009 Most of the existing, easily available parallel texts to train a statistical machine translation system are from international organizations that use a particular jargon. In this paper, we consider the automatic adaptation of such a translation model to the news domain. The initial system was trained on more than 200M words of UN bitexts. We then explore large amounts of in-domainmonolingual texts to modify the probability distribution of the phrase-table and to learn new task-specific phrase-pairs. This procedure achieved an improvement of 3.5 points BLEU on the test set in an Arabic/French statistical machine translation system. This result compares favorably with other large state-of-the-art systems for this language pair.

[1]  Haizhou Li,et al.  Exploiting N-best Hypotheses for SMT Self-Enhancement , 2008, ACL.

[2]  Alex Waibel,et al.  Adaptation of the translation model for statistical machine translation based on information retrieval , 2005, EAMT.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[6]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[7]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[8]  Hermann Ney,et al.  A Multi-Genre SMT System for Arabic to French , 2008, LREC.

[9]  Holger Schwenk,et al.  Investigations on large-scale lightly-supervised training for statistical machine translation. , 2008, IWSLT.

[10]  Alfons Juan-Císcar,et al.  Domain Adaptation in Statistical Machine Translation with Mixture Modelling , 2007, WMT@ACL.

[11]  Marcello Federico,et al.  Phrase-based statistical machine translation with pivot languages. , 2008, IWSLT.

[12]  Richard M. Schwartz,et al.  Language and Translation Model Adaptation using Comparable Corpora , 2008, EMNLP.

[13]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[14]  Nizar Habash,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Arabic Preprocessing Schemes for Statistical Machine Translation , 2006 .

[15]  Nicola Ueffing,et al.  Using monolingual source-language data to improve MT performance , 2006, IWSLT.

[16]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[17]  Gholamreza Haffari,et al.  Transductive learning for statistical machine translation , 2007, ACL.

[18]  Stephan Vogel,et al.  Language Model Adaptation for Statistical Machine Translation via Structured Query Models , 2004, COLING.