Tuning machine translation parameters with SPSA

Most of statistical machine translation systems are combinations of various models, and tuning of the scaling factors is an important step. However, this optimisation problem is hard because the objective function has many local minima and the available algorithms cannot achieve a global optimum. Consequently, optimisations starting from different initial settings can converge to fairly different solutions. We present tuning experiments with the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm, and compare them to tuning with the widely used downhill simplex method. With IWSLT 2006 Chinese-English data, both methods showed similar performance, but SPSA was more robust to the choice of initial settings.

[1]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[2]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[3]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[4]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[5]  Lixing Han,et al.  Effect of dimensionality on the Nelder–Mead simplex method , 2006, Optim. Methods Softw..

[6]  José B. Mariño,et al.  N-gram-based SMT System Enhanced with Reordering Patterns , 2006, WMT@HLT-NAACL.

[7]  James C. Spall,et al.  AN OVERVIEW OF THE SIMULTANEOUS PERTURBATION METHOD FOR EFFICIENT OPTIMIZATION , 1998 .

[8]  PietraVincent J. Della,et al.  The mathematics of statistical machine translation , 1993 .

[9]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[10]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[11]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[12]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[13]  José B. Mariño,et al.  The TALP Ngram-based SMT System for IWSLT 2006 , 2006 .

[14]  Richard Zens,et al.  The RWTH Machine Translation System , 2006 .

[15]  J. Spall Implementation of the simultaneous perturbation algorithm for stochastic optimization , 1998 .

[16]  Mei Yang,et al.  Improved Language Modeling for Statistical Machine Translation , 2005, ParallelText@ACL.

[17]  José B. Mariño,et al.  Bilingual N-gram Statistical Machine Translation , 2005 .

[18]  José B. Mariño,et al.  The TALP ngram-based SMT system for IWSLT'05 , 2005, IWSLT.

[19]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[20]  Marcello Federico,et al.  A Look inside the ITC-irst SMT System , 2005, MTSUMMIT.

[21]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .