Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation

This paper addresses the problem of dynamic model parameter selection for log-linear model based statistical machine translation (SMT) systems. In this work, we propose a principled method for this task by transforming it to a test data dependent development set selection problem. We present two algorithms for automatic development set construction, and evaluated our method on several NIST data sets for the Chinese-English translation task. Experimental results show that our method can effectively adapt log-linear model parameters to different test data, and consistently achieves good translation performance compared with conventional methods that use a fixed model parameter setting across different data sets.

[1]  J. Ivey,et al.  Ann Arbor, Michigan , 1969 .

[2]  R. Cleveland,et al.  Questions and Answers about Biological Effects and Potential Hazards of Radiofrequency Electromagnetic Fields , 1999 .

[3]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[4]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[5]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[6]  Stephan Vogel,et al.  Language Model Adaptation for Statistical Machine Translation via Structured Query Models , 2004, COLING.

[7]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[8]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[9]  Alex Waibel,et al.  Adaptation of the translation model for statistical machine translation based on information retrieval , 2005, EAMT.

[10]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[11]  Gholamreza Haffari,et al.  Transductive learning for statistical machine translation , 2007, ACL.

[12]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[13]  Qun Liu,et al.  Improving Statistical Machine Translation Performance by Training Data Selection and Optimization , 2007, EMNLP-CoNLL.

[14]  Spyridon Matsoukas,et al.  Discriminative Corpus Weight Estimation for Machine Translation , 2009, EMNLP.

[15]  J. Żabiński American National Standards Institute (ANSI) , 2010 .