Constituent Reordering and Syntax Models for English-to-Japanese Statistical Machine Translation

We present a constituent parsing-based reordering technique that improves the performance of the state-of-the-art English-to-Japanese phrase translation system that includes distortion models by 4.76 BLEU points. The phrase translation model with reordering applied at the pre-processing stage outperforms a syntax-based translation system that incorporates a phrase translation model, a hierarchical phrase-based translation model and a tree-to-string grammar. We also show that combining constituent reordering and the syntax model improves the translation quality by additional 0.84 BLEU points.

[1]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[2]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[3]  Nizar Habash Syntactic preprocessing for statistical machine translation , 2007, MTSUMMIT.

[4]  Chao Wang,et al.  Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[5]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[6]  Peng Xu,et al.  Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages , 2009, NAACL.

[7]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[8]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[9]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[10]  Yang Liu,et al.  Forest-to-String Statistical Translation Rules , 2007, ACL.

[11]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[12]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Spyridon Matsoukas,et al.  Effective Use of Linguistic and Contextual Information for Statistical Machine Translation , 2009, EMNLP.

[15]  Yaser Al-Onaizan,et al.  Generalizing Local and Non-Local Word-Reordering Patterns for Syntax-Based Machine Translation , 2008, EMNLP.

[16]  Yaser Al-Onaizan,et al.  Distortion Models for Statistical Machine Translation , 2006, ACL.

[17]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[18]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  Franz Josef Och,et al.  A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT , 2008, COLING.

[21]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[22]  Xavier Carreras,et al.  Non-Projective Parsing for Statistical Machine Translation , 2009, EMNLP.

[23]  Salim Roukos,et al.  Direct Translation Model 2 , 2007, HLT-NAACL.

[24]  Kristina Toutanova,et al.  A Discriminative Syntactic Word Order Model for Machine Translation , 2007, ACL.

[25]  Aravind K. Joshi,et al.  Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.

[26]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[27]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[28]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[29]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[30]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[31]  Daniel Gildea,et al.  Synchronous Binarization for Machine Translation , 2006, NAACL.

[32]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[33]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[34]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[35]  Yuan Ding,et al.  Machine Translation Using Probabilistic Synchronous Dependency Insertion Grammars , 2005, ACL.