Oracle-based Training for Phrase-based Statistical Machine Translation

A Statistical Machine Translation (SMT) system generates an n-best list of candidate translations for each sentence. A model error occurs if the most probable translation (1-best) generated by the SMT decoder is not the most accurate as measured by its similarity to the human reference translation(s) (an oracle). In this paper we investigate the parametric differences between the 1-best and the oracle translation and attempt to try and close this gap by proposing two rescoring strategies to push the oracle up the n-best list. We observe modest improvements in METEOR scores over the baseline SMT system trained on French– English Europarl corpora. We present a detailed analysis of the oracle rankings to determine the source of model errors, which in turn has the potential to improve overall system performance.

[1]  Kenji Yamada,et al.  Reranking for Large-Scale Statistical Machine Translation , 2008 .

[2]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[3]  Barry Haddow,et al.  Interactive Assistance to Human Translators using Statistical Machine Translation Methods , 2009, MTSUMMIT.

[4]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[5]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[6]  Hermann Ney,et al.  Are Very Large N-Best Lists Useful for SMT? , 2007, HLT-NAACL.

[7]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[8]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[9]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[10]  Yifan He,et al.  Improving the Objective Function in Minimum Error Rate Training , 2009, MTSUMMIT.

[11]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[12]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[13]  Kevin Duh,et al.  Beyond Log-Linear Models: Boosted Minimum Error Rate Training for N-best Re-ranking , 2008, ACL.

[14]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[15]  Alexander M. Fraser,et al.  A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[16]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[17]  Daniel Marcu,et al.  Fast and optimal decoding for machine translation , 2004, Artif. Intell..

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[20]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[21]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[22]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.