A Ranking-based Approach to Word Reordering for Statistical Machine Translation

Long distance word reordering is a major challenge in statistical machine translation research. Previous work has shown using source syntactic trees is an effective way to tackle this problem between two languages with substantial word order difference. In this work, we further extend this line of exploration and propose a novel but simple approach, which utilizes a ranking model based on word order precedence in the target language to reposition nodes in the syntactic parse tree of a source sentence. The ranking model is automatically derived from word aligned parallel data with a syntactic parser for source language based on both lexical and syntactical features. We evaluated our approach on large-scale Japanese-English and English-Japanese machine translation tasks, and show that it can significantly outperform the baseline phrase-based SMT system.

[1]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[2]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[3]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[4]  Franz Josef Och,et al.  Statistical machine translation: from single word models to alignment templates , 2002 .

[5]  Yuji Matsumoto,et al.  Japanese Dependency Analysis using Cascaded Chunking , 2002, CoNLL.

[6]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[7]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[8]  Joakim Nivre,et al.  Deterministic Dependency Parsing of English Text , 2004, COLING.

[9]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[10]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[11]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[12]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[13]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[14]  Qun Liu,et al.  Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[15]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[16]  Hermann Ney,et al.  Discriminative Reordering Models for Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[17]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[18]  Ming Zhou,et al.  A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation , 2007, ACL.

[19]  Chao Wang,et al.  Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[20]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[21]  Pushpak Bhattacharyya,et al.  Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation , 2008, IJCNLP.

[22]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[23]  Peng Xu,et al.  Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages , 2009, NAACL.

[24]  Jason Eisner,et al.  Search and learning for the linear ordering problem with an application to machine translation , 2009 .

[25]  Karthik Visweswariah,et al.  Syntax Based Reordering with Automatically Derived Rules for Improved Statistical Machine Translation , 2010, COLING.

[26]  Dmitriy Genzel,et al.  Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation , 2010, COLING.

[27]  Xiaoqiang Luo,et al.  Constituent Reordering and Syntax Models for English-to-Japanese Statistical Machine Translation , 2010, COLING.