Large-scale Expected BLEU Training of Phrase-based Reordering Models

Recent work by Cherry (2013) has shown that directly optimizing phrase-based reordering models towards BLEU can lead to significant gains. Their approach is limited to small training sets of a few thousand sentences and a similar number of sparse features. We show how the expected BLEU objective allows us to train a simple linear discriminative reordering model with millions of sparse features on hundreds of thousands of sentences resulting in significant improvements. A comparison to likelihood training demonstrates that expected BLEU is vastly more effective. Our best results improve a hierarchical lexicalized reordering baseline by up to 2.0 BLEU in a single-reference setting on a French-English WMT 2012 setup.

[1]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[2]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[3]  Richard M. Schwartz,et al.  Expected BLEU Training for Graphs: BBN System Description for WMT11 System Combination Task , 2011, WMT@EMNLP.

[4]  Jimmy J. Lin,et al.  Mr. MIRA: Open-Source Large-Margin Structured Learning on MapReduce , 2013, ACL.

[5]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[6]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[7]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[8]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[9]  Haitao Mi,et al.  Max-Violation Perceptron and Forced Decoding for Scalable MT Training , 2013, EMNLP.

[10]  Richard M. Schwartz,et al.  BBN System Description for WMT10 System Combination Task , 2010, WMT@ACL.

[11]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[12]  Jianfeng Gao,et al.  Learning Continuous Phrase Representations for Translation Modeling , 2014, ACL.

[13]  Gunnar Rätsch,et al.  Advanced lectures on machine learning : ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, Tübingen, Germany, August 4-16, 2003 : revised lectures , 2004 .

[14]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[15]  Akira Shimazu,et al.  Improving a Lexicalized Hierarchical Reordering Model Using Maximum Entropy , 2009, MTSUMMIT.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Qun Liu,et al.  Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[18]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[19]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[20]  Shankar Kumar,et al.  Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[21]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[22]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[23]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[24]  Léon Bottou,et al.  Stochastic Learning , 2003, Advanced Lectures on Machine Learning.

[25]  Jianfeng Gao,et al.  Training MRF-Based Phrase Translation Models using Gradient Ascent , 2013, NAACL.

[26]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[27]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[28]  Preslav Nakov,et al.  Optimizing for Sentence-Level BLEU+1 Yields Short Translations , 2012, COLING.

[29]  Colin Cherry Improved Reordering for Phrase-Based Translation using Sparse Features , 2013, HLT-NAACL.

[30]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[31]  Li Deng,et al.  Maximum Expected BLEU Training of Phrase and Lexicon Translation Models , 2012, ACL.

[32]  Christopher D. Manning,et al.  An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation , 2014, WMT@ACL.