Phrase Translation Probabilities with ITG Priors and Smoothing as Learning Objective

The conditional phrase translation probabilities constitute the principal components of phrase-based machine translation systems. These probabilities are estimated using a heuristic method that does not seem to optimize any reasonable objective function of the word-aligned, parallel training corpus. Earlier efforts on devising a better understood estimator either do not scale to reasonably sized training data, or lead to deteriorating performance. In this paper we explore a new approach based on three ingredients (1) A generative model with a prior over latent segmentations derived from Inversion Transduction Grammar (ITG), (2) A phrase table containing all phrase pairs without length limit, and (3) Smoothing as learning objective using a novel Maximum-A-Posteriori version of Deleted Estimation working with Expectation-Maximization. Where others conclude that latent segmentations lead to overfitting and deteriorating performance, we show here that these three ingredients give performance equivalent to the heuristic method on reasonably sized training data.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Daniel Gildea,et al.  Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing , 2008, ACL.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[5]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.

[6]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[7]  Daniel Gildea,et al.  Synchronous Binarization for Machine Translation , 2006, NAACL.

[8]  Philipp Koehn,et al.  Design of the Moses Decoder for Statistical Machine Translation , 2008, SETQALNLP.

[9]  Philipp Koehn,et al.  Constraining the Phrase-Based, Joint Probability Statistical Translation Model , 2006, WMT@HLT-NAACL.

[10]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[11]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[12]  Chris Quirk,et al.  An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation , 2007, WMT@ACL.

[13]  KHALIL SIMA’AN Computational Complexity of Probabilistic Disambiguation , 2002, Grammars.

[14]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  David Chiang,et al.  An Introduction to Synchronous Grammars , 2006 .

[17]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[18]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[19]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[20]  K. S. '. An Computational Complexity of Probabilistic Disambiguation , 2002 .

[21]  John DeNero,et al.  Why Generative Phrase Models Underperform Surface Heuristics , 2006, WMT@HLT-NAACL.

[22]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[23]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[24]  Phil Blunsom,et al.  A Discriminative Latent Variable Model for Statistical Machine Translation , 2008, ACL.

[25]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[26]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[27]  Khalil Sima'an,et al.  Backoff Parameter Estimation for the DOP Model , 2003, ECML.

[28]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[29]  Khalil Sima'an,et al.  A Consistent and Efficient Estimator for Data-Oriented Parsing , 2005, J. Autom. Lang. Comb..

[30]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[31]  Daniel Gildea,et al.  Binarization of Synchronous Context-Free Grammars , 2009, CL.