Forest-to-String Statistical Translation Rules

In this paper, we propose forest-to-string rules to enhance the expressive power of tree-to-string translation models. A forestto-string rule is capable of capturing nonsyntactic phrase pairs by describing the correspondence between multiple parse trees and one string. To integrate these rules into tree-to-string translation models, auxiliary rules are introduced to provide a generalization level. Experimental results show that, on the NIST 2005 Chinese-English test set, the tree-to-string model augmented with forest-to-string rules achieves a relative improvement of 4.3% in terms of BLEU score over the original model which allows treeto-string rules only.

[1]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[2]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[3]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[4]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[5]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[6]  Chris Quirk,et al.  The impact of parse quality on syntactically-informed statistical machine translation , 2006, EMNLP.

[7]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[8]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[9]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[10]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[11]  Stephan Vogel,et al.  Considerations in maximum mutual information and minimum classification error training for statistical machine translation , 2005, EAMT.

[12]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[13]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[16]  Qun Liu,et al.  Parsing the Penn Chinese Treebank with Semantic Knowledge , 2005, IJCNLP.

[17]  Ying Zhang,et al.  Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System? , 2004, LREC.

[18]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.