Forest-based Tree Sequence to String Translation Model

This paper proposes a forest-based tree sequence to string translation model for syntax-based statistical machine translation, which automatically learns tree sequence to string translation rules from word-aligned source-side-parsed bilingual texts. The proposed model leverages on the strengths of both tree sequence-based and forest-based translation models. Therefore, it can not only utilize forest structure that compactly encodes exponential number of parse trees but also capture nonsyntactic translation equivalences with linguistically structured information through tree sequence. This makes our model potentially more robust to parse errors and structure divergence. Experimental results on the NIST MT-2003 Chinese-English translation task show that our method statistically significantly outperforms the four baseline systems.

[1]  Qun Liu,et al.  Forest-Based Translation , 2008, ACL.

[2]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[3]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[4]  Ai Ti Aw,et al.  A tree-to-tree alignment-based model for statistical machine translation , 2007, MTSUMMIT.

[5]  Yang Liu,et al.  Forest-to-String Statistical Translation Rules , 2007, ACL.

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[8]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[9]  Haizhou Li,et al.  Grammar Comparison Study for Translational Equivalence Modeling and Statistical Machine Translation , 2008, COLING.

[10]  Haitao Mi,et al.  Forest-based Translation Rule Extraction , 2008, EMNLP.

[11]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[12]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[13]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[14]  Ying Zhang,et al.  Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System? , 2004, LREC.

[15]  Liang Huang,et al.  Statistical Syntax-Directed Translation with Extended Domain of Locality , 2006, AMTA.

[16]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[17]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[18]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[19]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[20]  Dan Klein,et al.  Parsing and Hypergraphs , 2001, IWPT.

[21]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[22]  Haizhou Li,et al.  A Tree Sequence Alignment-based Tree-to-Tree Translation Model , 2008, ACL.

[23]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[24]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.