LIUM’s SMT Machine Translation Systems for WMT 2011

This paper describes the development of French--English and English--French statistical machine translation systems for the 2012 WMT shared task evaluation. We developed phrase-based systems based on the Moses decoder, trained on the provided data only. Additionally, new features this year included improved language and translation model adaptation using the cross-entropy score for the corpus selection.

[1]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[2]  James P. Callan,et al.  Experiments Using the Lemur Toolkit , 2001, TREC.

[3]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Jianfeng Gao,et al.  Toward a unified approach to statistical language modeling for Chinese , 2002, TALIP.

[6]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[7]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[8]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[9]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[10]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[11]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[12]  Joel D. Martin,et al.  Improving Translation Quality by Discarding Most of the Phrasetable , 2007, EMNLP.

[13]  Holger Schwenk,et al.  Investigations on large-scale lightly-supervised training for statistical machine translation. , 2008, IWSLT.

[14]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[15]  Nizar Habash,et al.  Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation , 2008, ACL.

[16]  Holger Schwenk,et al.  On the Use of Comparable Corpora to Improve SMT performance , 2009, EACL.

[17]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[18]  Holger Schwenk,et al.  LIUM SMT Machine Translation System for WMT 2010 , 2010, WMT@ACL.

[19]  Forms Wanted : Training SMT on Monolingual Data , 2010 .

[20]  Holger Schwenk,et al.  LIUM’s SMT Machine Translation Systems for WMT 2012 , 2011 .

[21]  Marcello Federico Methods for Smoothing the Optimizer Instability in SMT , 2011, MTSUMMIT.

[22]  Holger Schwenk,et al.  Investigations on Translation Model Adaptation Using Monolingual Data , 2011, WMT@EMNLP.