Combining Multi-Engine Machine Translation and Online Learning through Dynamic Phrase Tables

Extending phrase-based Statistical Machine Translation systems with a second, dynamic phrase table has been done for multiple purposes. Promising results have been reported for hybrid or multi-engine machine translation, i.e.\ building a phrase table from the knowledge of external MT systems, and for online learning. We argue that, in prior research, dynamic phrase tables are not scored optimally because they may be of small size, which makes the Maximum Likelihood Estimation of translation probabilities unreliable. We propose basing the scores on frequencies from both the dynamic corpus and the primary corpus instead, and show that this modification significantly increases performance. We also explore the combination of multi-engine MT and online learning.

[1]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[2]  NeyHermann,et al.  A systematic comparison of various statistical alignment models , 2003 .

[3]  Alon Lavie,et al.  CMU Multi-Engine Machine Translation for WMT 2010 , 2010, WMT@ACL.

[4]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[5]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[6]  Martin Volk,et al.  Challenges in Building a Multilingual Alpine Heritage Corpus , 2010, LREC.

[7]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[8]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[9]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[10]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[11]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[12]  Hans Uszkoreit,et al.  Combining Multi-Engine Translations with Moses , 2009, WMT@EACL.

[13]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[15]  D. Hardt,et al.  Incremental Re-training for Post-editing SMT , 2010, AMTA.

[16]  Loïc Barrault,et al.  MANY: Open Source MT System Combination at WMT’10 , 2010, WMT@ACL.

[17]  Andreas Eisele,et al.  Multi-engine machine translation with an open-source decoder for statistical machine translation , 2007 .

[18]  Joel D. Martin,et al.  Improving Translation Quality by Discarding Most of the Phrasetable , 2007, EMNLP.

[19]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.