Improved Phrase Translation Modeling Using Maximum A-Posteriori (MAP) Adaptation

Abstract : In this paper, we explore several methods of improving the estimation of translation model probabilities for phrase-based statistical machine translation given in-domain data sparsity. We introduce a hierarchical variant of MAP adaptation for domain adaptation with an arbitrary number of out-of-domain models. We compare this adaptation technique to linear interpolation and phrase table fill-up. Additionally, we note that domain adaptation can have a smoothing effect, and we explore the interaction between smoothing and the incorporation of out-of-domain data. We find that the relative contributions of smoothing and interpolation depend on the datasets used. For both the IWSLT 2011 and WMT 2011 English-French datasets, the MAP adaptation method we present improves on a baseline system by 1.5+ BLEU points.

[1]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[2]  Marcello Federico,et al.  Bayesian estimation methods for n-gram language model adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[4]  George F. Foster,et al.  Unpacking and Transforming Feature Functions: New Ways to Smooth Phrase Tables , 2011, MTSUMMIT.

[5]  Mauro Cettolo,et al.  The ITC-irst SMT system for IWSLT 2006 , 2006, IWSLT.

[6]  Timothy R. Anderson,et al.  The MIT-LL/AFRL IWSLT-2006 MT system , 2006, IWSLT.

[7]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[8]  B. Martin PARAMETER ESTIMATION , 2012, Statistical Methods for Biomedical Research.

[9]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[10]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[11]  Sebastian Stüker,et al.  Overview of the IWSLT 2011 evaluation campaign , 2011, IWSLT.

[12]  William M. Campbell,et al.  Speaker Verification Using Support Vector Machines and High-Level Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  James R. Glass,et al.  Iterative language model estimation: efficient data structure & algorithms , 2008, INTERSPEECH.

[14]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[15]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[16]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Arianna Bisazza,et al.  Fill-up versus interpolation methods for phrase-based SMT adaptation , 2011, IWSLT.

[18]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[19]  Brian Roark,et al.  MAP adaptation of stochastic grammars , 2006, Comput. Speech Lang..

[20]  Roland Kuhn,et al.  Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.