论文信息 - Optimization for Statistical Machine Translation: A Survey

Optimization for Statistical Machine Translation: A Survey

In statistical machine translation (SMT), the optimization of the system parameters to maximize translation accuracy is now a fundamental part of virtually all modern systems. In this article, we survey 12 years of research on optimization for SMT, from the seminal work on discriminative models (Och and Ney 2002) and minimum error rate training (Och 2003), to the most recent advances. Starting with a brief introduction to the fundamentals of SMT systems, we follow by covering a wide variety of optimization algorithms for use in both batch and online optimization. Specifically, we discuss losses based on direct error minimization, maximum likelihood, maximum margin, risk minimization, ranking, and more, along with the appropriate methods for minimizing these losses. We also cover recent topics, including large-scale optimization, nonlinear models, domain-dependent optimization, and the effect of MT evaluation measures or search on optimization. Finally, we discuss the current state of affairs in MT optimization, and point out some unresolved problems that will likely be the target of further research in optimization for MT.

Taro Watanabe | Graham Neubig | Graham Neubig | Taro Watanabe

[1] Chris Dyer. Two monolingual parses are better than one (synchronous parse) , 2010, HLT-NAACL.

[2] Richard M. Schwartz,et al. BBN System Description for WMT10 System Combination Task , 2010, WMT@ACL.

[3] Alon Lavie,et al. Learning from Post-Editing: Online Model Adaptation for Statistical Machine Translation , 2014, EACL.

[4] Mo Yu,et al. Locally Training the Log-Linear Model for SMT , 2012, EMNLP.

[5] Expected Error Minimization with Ultraconservative Update for SMT , 2012, COLING.

[6] Jason Eisner,et al. Parameter Estimation for Probabilistic Finite-State Transducers , 2002, ACL.

[7] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[8] Noah A. Smith,et al. Structured Ramp Loss Minimization for Machine Translation , 2012, HLT-NAACL.

[9] Andrew McCallum,et al. Machine Translation Using Overlapping Alignments and SampleRank , 2009 .

[10] Christopher D. Manning,et al. A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[11] John Langford,et al. Slow Learners are Fast , 2009, NIPS.

[12] Kevin Knight,et al. 11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[13] Lemao Liu,et al. Additive Neural Networks for Statistical Machine Translation , 2013, ACL.

[14] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[15] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[16] Alon Lavie,et al. One System, Many Domains: Open-Domain Statistical Machine Translation via Feature Augmentation , 2012, AMTA.

[17] Kristina Toutanova,et al. Regularized Minimum Error Rate Training , 2013, EMNLP.

[18] Daniel Jurafsky,et al. Regularization and Search for Minimum Error Rate Training , 2008, WMT@ACL.

[19] S. Sathiya Keerthi,et al. Deterministic annealing for semi-supervised kernel machines , 2006, ICML.

[20] Yinggong Zhao,et al. Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation , 2010, COLING.

[21] Lemao Liu,et al. Search-Aware Tuning for Machine Translation , 2014, EMNLP.

[22] Brian Roark,et al. Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation , 2011, EMNLP.

[23] Christopher D. Manning,et al. Fast and Adaptive Online Training of Feature-Rich Translation Models , 2013, ACL.

[24] Chris Callison-Burch,et al. Feasibility of Human-in-the-loop Minimum Error Rate Training , 2009, EMNLP.

[25] Taro Watanabe,et al. Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[26] Josef van Genabith,et al. Simple and Effective Parameter Tuning for Domain Adaptation of Statistical Machine Translation , 2012, COLING.

[27] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[28] Chris Quirk,et al. Optimal Search for Minimum Error Rate Training , 2011, EMNLP.

[29] Hermann Ney,et al. Forced Derivations for Hierarchical Machine Translation , 2012, COLING.

[30] Christopher D. Manning,et al. An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation , 2014, WMT@ACL.

[31] Dan Klein,et al. Parsing and Hypergraphs , 2001, IWPT.

[32] Jean-Cédric Chappelier,et al. A Generalized CYK Algorithm for Parsing Stochastic CFG , 1998, TAPD.

[33] John Shawe-Taylor,et al. Kernel Regression Based Machine Translation , 2007, NAACL.

[34] François Yvon,et al. Computing Lattice BLEU Oracle Scores for Machine Translation , 2012, EACL.

[35] Chris Quirk,et al. Random Restarts in Minimum Error Rate Training for Statistical Machine Translation , 2008, COLING.

[36] Noah A. Smith,et al. Feature-Rich Translation by Quasi-Synchronous Lattice Parsing , 2009, EMNLP.

[37] Hermann Ney,et al. Complexity of Finding the BLEU-optimal Hypothesis in a Confusion Network , 2008, EMNLP.

[38] Peter L. Bartlett,et al. Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..

[39] Taro Watanabe,et al. Structural support vector machines for log-linear approach in statistical machine translation , 2009, IWSLT.

[40] Patrick Nguyen,et al. Training Non-Parametric Features for Statistical Machine Translation , 2007, WMT@ACL.

[41] Mauro Cettolo,et al. Online Learning Approaches in Computer Assisted Translation , 2013, WMT@ACL.

[42] Timothy Baldwin,et al. Is Machine Translation Getting Better over Time? , 2014, EACL.

[43] Dekai Wu,et al. Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[44] Stanley F. Chen,et al. A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[45] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[46] Hwee Tou Ng,et al. Better Evaluation Metrics Lead to Better Machine Translation , 2011, EMNLP.

[47] Philip Resnik,et al. A formal model of ambiguity and its applications in machine translation , 2010 .

[48] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[49] Koby Crammer,et al. Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[50] Kevin Knight,et al. A Syntax-based Statistical Translation Model , 2001, ACL.

[51] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.

[52] Christopher D. Manning,et al. Stanford University's Submissions to the WMT 2014 Translation Task , 2014, WMT@ACL.

[53] Michel Galley,et al. Direct Error Rate Minimization for Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[54] Kristina Toutanova,et al. Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines , 2013, ACL.

[55] Dekai Wu,et al. Improving machine translation by training against an automatic semantic frame based evaluation metric , 2013, ACL.

[56] Anoop Sarkar,et al. Stacking for Statistical Machine Translation , 2013, ACL.

[57] Thomas Ottmann,et al. Algorithms for Reporting and Counting Geometric Intersections , 1979, IEEE Transactions on Computers.

[58] Philipp Koehn,et al. Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[59] Stephen J. Wright,et al. Conjugate Gradient Methods , 1999 .

[60] Bowen Zhou,et al. Discriminative Training of 150 Million Translation Parameters and Its Application to Pruning , 2013, HLT-NAACL.

[61] James Henderson,et al. Heuristic Search for Non-Bottom-Up Tree Structure Prediction , 2011, EMNLP.

[62] R. Tibshirani,et al. Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[63] Bing Zhao,et al. A Simplex Armijo Downhill Algorithm for Optimizing Statistical Machine Translation Decoding Parameters , 2009, NAACL.

[64] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[65] Chin-Yew Lin,et al. ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[66] Alexander H. Waibel,et al. Training and Evaluating Error Minimization Decision Rules for Statistical Machine Translation , 2005, ParallelText@ACL.

[67] Markus Dreyer,et al. APRO: All-Pairs Ranking Optimization for MT Tuning , 2015, NAACL.

[68] Tie-Yan Liu,et al. Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[69] Koby Crammer,et al. Adaptive regularization of weight vectors , 2009, Machine Learning.

[70] John DeNero,et al. Fast Consensus Decoding over Translation Forests , 2009, ACL.

[71] Hermann Ney,et al. Are Very Large N-Best Lists Useful for SMT? , 2007, HLT-NAACL.

[72] Phil Blunsom,et al. Probabilistic Inference for Machine Translation , 2008, EMNLP.

[73] Chih-Jen Lin,et al. A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[74] Zhifei Li,et al. First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests , 2009, EMNLP.

[75] Vladimir Eidelman,et al. The University of Maryland Statistical Machine Translation System for the Fourth Workshop on Machine Translation , 2009, WMT@ACL.

[76] Barry Haddow,et al. Applying Pairwise Ranked Optimisation to Improve the Interpolation of Translation Models , 2013, NAACL.

[77] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[78] Klaus Obermayer,et al. Support vector learning for ordinal regression , 1999 .

[79] Philip Koehn,et al. Statistical Machine Translation , 2010, EAMT.

[80] Phil Blunsom,et al. A Discriminative Latent Variable Model for Statistical Machine Translation , 2008, ACL.

[81] Yoram Singer,et al. An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[82] Shankar Kumar,et al. Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[83] Tong Zhang,et al. A Discriminative Global Training Algorithm for Statistical MT , 2006, ACL.

[84] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[85] Ben Taskar,et al. An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[86] Chris Dyer,et al. Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT , 2012, ACL.

[87] Kevin Duh,et al. Multi-Metric Optimization Using Ensemble Tuning , 2013, NAACL.

[88] Yoram Singer,et al. Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[89] Gideon S. Mann,et al. Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.

[90] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[91] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[92] Koby Crammer,et al. Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[93] David A. Smith,et al. Minimum Risk Annealing for Training Log-Linear Models , 2006, ACL.

[94] Yang Liu,et al. Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training , 2011, EMNLP.

[95] Philipp Koehn,et al. SampleRank Training for Phrase-Based Machine Translation , 2011, WMT@EMNLP.

[96] Kevin Duh,et al. Beyond Log-Linear Models: Boosted Minimum Error Rate Training for N-best Re-ranking , 2008, ACL.

[97] Xinyan Xiao,et al. Max-Margin Synchronous Grammar Induction for Machine Translation , 2013, EMNLP.

[98] Holger Schwenk,et al. Optimising Multiple Metrics with MERT , 2011, Prague Bull. Math. Linguistics.

[99] George F. Foster,et al. Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[100] François Yvon,et al. Minimum Error Rate Training Semiring , 2011, EAMT.

[101] William H. Press,et al. Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[102] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[103] Shankar Kumar,et al. Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices , 2009, ACL/IJCNLP.

[104] Daniel Jurafsky,et al. The Best Lexical Metric for Phrase-Based Statistical MT System Optimization , 2010, NAACL.

[105] David Chiang,et al. Hierarchical Phrase-Based Translation , 2007, CL.

[106] Yang Guo,et al. Structured Perceptron with Inexact Search , 2012, NAACL.

[107] Kevin Duh,et al. Distributed Minimum Error Rate Training of SMT using Particle Swarm Optimization , 2011, IJCNLP.

[108] Philipp Koehn,et al. Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[109] Kishore Papineni. Discriminative training via linear programming , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[110] Hermann Ney,et al. Generation of Word Graphs in Statistical Machine Translation , 2002, EMNLP.

[111] Ming Zhou,et al. Multi-Domain Adaptation for SMT Using Multi-Task Learning , 2013, EMNLP.

[112] Liang Huang,et al. A Syntax-Directed Translator with Extended Domain of Locality , 2006 .

[113] Gholamreza Haffari,et al. Transductive learning for statistical machine translation , 2007, ACL.

[114] I. Dan Melamed,et al. Scalable Discriminative Learning for Natural Language Parsing and Translation , 2006, NIPS.

[115] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[116] Michael Collins,et al. A Discriminative Model for Tree-to-Tree Translation , 2006, EMNLP.

[117] Hermann Ney,et al. Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[118] Jaime G. Carbonell,et al. Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search , 2013, HLT-NAACL.

[119] Roland Kuhn,et al. Stabilizing Minimum Error Rate Training , 2009, WMT@EACL.

[120] Wolfgang Macherey,et al. Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[121] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.