Tuning Methods in Statistical Machine Translation

In a Statistical Machine Translation system many models, called features, complement each other in producing natural language translations. In how far we should rely on a certain feature is governed by parameters, or weights. Learning these weights is the sub field of SMT, called parameter tuning, that is addressed in this thesis. Three existing methods for learning such parameters are compared. We recast MERT, MIRA and Downhill Simplex in a uniform framework, to allow for easy and consistent comparison. Based on our findings and forthcoming opportunities for improvements, we introduce two new methods. A straightforward sampling approach, Local Unimodal Sampling (LUS), that uniformly samples from a decreasing area around a constantly updated peak in weightvector space. And a ranking based approach, implementing SVM-Rank, that focusses on giving, besides the best translations, also its runner-ups a high score. We empirically compare our own methods to existing methods and find that LUS slightly, but significantly, outperforms the state-of-the-art MERT method in a realistic setting with 14 features. We claim that this progress, the simplicity of the radically different approach of the method obtaining this progress and the clear overview of existing work are contributions to the field. Our SVM-Rank showed no improvement over the-state-of-the-art within our experimental setup.

[1]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[2]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[3]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[4]  Taro Watanabe,et al.  Structural support vector machines for log-linear approach in statistical machine translation , 2009, IWSLT.

[5]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[6]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[7]  Taro Watanabe,et al.  Larger feature set approach for machine translation in IWSLT 2007 , 2007, IWSLT.

[8]  Adam Lopez,et al.  Statistical machine translation , 2007, CSUR.

[9]  R. Darnell Translation , 1873, The Indian medical gazette.

[10]  M. E. H. Pedersen,et al.  Tuning & simplifying heuristical optimization , 2010 .

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[13]  Bing Zhao,et al.  A Simplex Armijo Downhill Algorithm for Optimizing Statistical Machine Translation Decoding Parameters , 2009, NAACL.

[14]  Kevin Duh,et al.  Ranking vs. Regression in Machine Translation Evaluation , 2008, WMT@ACL.

[15]  Roland Kuhn,et al.  Stabilizing Minimum Error Rate Training , 2009, WMT@EACL.

[16]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[17]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[18]  Omar Zaidan,et al.  Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems , 2009, Prague Bull. Math. Linguistics.

[19]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[20]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[21]  Peter Gerrand,et al.  Estimating Linguistic Diversity on the Internet: A Taxonomy to Avoid Pitfalls and Paradoxes , 2007, J. Comput. Mediat. Commun..

[22]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[23]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[25]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[26]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[27]  Philipp Koehn,et al.  A Unified Approach to Minimum Risk Training and Decoding , 2010, WMT@ACL.

[28]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[29]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[30]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[31]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[32]  Franz Josef Och,et al.  Statistical machine translation: from single word models to alignment templates , 2002 .

[33]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[34]  Daniel Jurafsky,et al.  Regularization and Search for Minimum Error Rate Training , 2008, WMT@ACL.

[35]  M. Rey,et al.  11 , 001 New Features for Statistical Machine Translation , 2009 .

[36]  Hermann Ney,et al.  Comparison of Extended Lexicon Models in Search and Rescoring for SMT , 2009, HLT-NAACL.

[37]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[38]  Hermann Ney,et al.  A Comparative Study on Reordering Constraints in Statistical Machine Translation , 2003, ACL.

[39]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[40]  Shankar Kumar,et al.  Local Phrase Reordering Models for Statistical Machine Translation , 2005, HLT.

[41]  Douglas M. Hawkins,et al.  On the Investigation of Alternative Regressions by Principal Component Analysis , 1973 .

[42]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[43]  Chris Quirk,et al.  Random Restarts in Minimum Error Rate Training for Statistical Machine Translation , 2008, COLING.

[44]  Kevin Duh,et al.  Beyond Log-Linear Models: Boosted Minimum Error Rate Training for N-best Re-ranking , 2008, ACL.

[45]  John Cocke,et al.  A Statistical Approach to Language Translation , 1988, COLING.

[46]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[47]  Daniel Jurafsky,et al.  The Best Lexical Metric for Phrase-Based Statistical MT System Optimization , 2010, NAACL.

[48]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[49]  Geoffrey C. Fox,et al.  Parallel Computing Works , 1994 .

[50]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[51]  D. W. Barron Machine Translation , 1968, Nature.

[52]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[53]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[54]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[55]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[56]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[57]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[58]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[59]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[60]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[61]  Taro Watanabe,et al.  Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[62]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[63]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .