Maximum Expected BLEU Training of Phrase and Lexicon Translation Models

This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system. In IWSLT 2011 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks.

[1]  Roland Kuhn,et al.  Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.

[2]  Necip Fazil Ayan,et al.  Going Beyond AER: An Extensive Analysis of Word Alignments and Their Impact on MT , 2006, ACL.

[3]  Dimitri Kanevsky,et al.  An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[4]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[5]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[6]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[8]  Hwee Tou Ng,et al.  Decomposability of Translation Metrics for Improved Evaluation and Efficient Algorithms , 2008, EMNLP.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Phil Blunsom,et al.  A Discriminative Latent Variable Model for Statistical Machine Translation , 2008, ACL.

[11]  Shankar Kumar,et al.  Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[12]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[13]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[14]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[15]  Jan Niehues,et al.  The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation , 2012, LREC.

[16]  Richard M. Schwartz,et al.  Expected BLEU Training for Graphs: BBN System Description for WMT11 System Combination Task , 2011, WMT@EMNLP.

[17]  Hermann Ney,et al.  Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[18]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[19]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[20]  Robert C. Moore,et al.  Faster beam-search decoding for phrasal statistical machine translation , 2007, MTSUMMIT.

[21]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[22]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[23]  Xiaodong He Using Word-Dependent Transition Models in HMM-Based Word Alignment for Statistical Machine Translation , 2007, WMT@ACL.

[24]  Wu Chou,et al.  Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.

[25]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[26]  Sebastian Stüker,et al.  Overview of the IWSLT 2011 evaluation campaign , 2011, IWSLT.

[27]  David A. Smith,et al.  Minimum Risk Annealing for Training Log-Linear Models , 2006, ACL.

[28]  Yang Liu,et al.  Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training , 2011, EMNLP.