A Discriminative Syntactic Model for Source Permutation via Tree Transduction

A major challenge in statistical machine translation is mitigating the word order differences between source and target strings. While reordering and lexical translation choices are often conducted in tandem, source string permutation prior to translation is attractive for studying reordering using hierarchical and syntactic structure. This work contributes an approach for learning source string permutation via transfer of the source syntax tree. We present a novel discriminative, probabilistic tree transduction model, and contribute a set of empirical upperbounds on translation performance for English-to-Dutch source string permutation under sequence and parse tree constraints. Finally, the translation performance of our learning model is shown to outperform the state-of-the-art phrase-based system significantly.

[1]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[2]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[3]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[4]  Avinesh PVS A Data Mining Approach to Learn Reorder Rules for SMT , 2010, HLT-NAACL 2010.

[5]  Hermann Ney,et al.  Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation , 2003, CL.

[6]  Marta R. Costa-jussà,et al.  Statistical Machine Reordering , 2006, EMNLP.

[7]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[8]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Qun Liu,et al.  Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[11]  Maxim Khalilov,et al.  New Statistical And Syntactic Models For Machine Translation , 2010 .

[12]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[13]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[14]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[15]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[16]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[17]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[18]  Jason Eisner,et al.  Learning Linear Ordering Problems for Better Translation , 2009, EMNLP.

[19]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[20]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[21]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[22]  Dekai Wu,et al.  Machine Translation with a Stochastic Grammatical Channel , 1998, COLING-ACL.

[23]  Franz Josef Och,et al.  An Efficient Method for Determining Bilingual Word Classes , 1999, EACL.

[24]  Khalil Sima'an,et al.  Source reordering using MaxEnt classifiers and supertags , 2010, EAMT.

[25]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[26]  Chao Wang,et al.  Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[27]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[28]  Hermann Ney,et al.  A Comparative Study on Reordering Constraints in Statistical Machine Translation , 2003, ACL.