Dependency-based Reordering Model for Constituent Pairs in Hierarchical SMT

We propose a novel dependency-based reordering model for hierarchical SMT that predicts the translation order of two types of pairs of constituents of the source tree: head-dependent and dependent-dependent. Our model uses the dependency structure of the source sentence to capture the mediumand long-distance reorderings between these pairs of constituents. We describe our reordering model in detail and then apply it to a language pair in which the languages involved follow different word order patterns, English (SVO) and Farsi (free word order being SOV the most frequent pattern). Our model outperforms a baseline (standard hierarchical SMT) by 0.78 BLEU points absolute, statistically significant at p = 0.01.

[1]  Philipp Koehn,et al.  A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation , 2009, IWSLT.

[2]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[3]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[4]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[5]  Philip Resnik,et al.  Modeling Syntactic and Semantic Structures in Hierarchical Phrase-based Translation , 2013, HLT-NAACL.

[6]  Haizhou Li,et al.  Modeling the Translation of Predicate-Argument Structure for SMT , 2012, ACL.

[7]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[8]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[9]  Rabih Zbib,et al.  Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation , 2013, EMNLP.

[10]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[11]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[12]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[13]  Alexandra Birch,et al.  Reordering metrics for statistical machine translation , 2011 .

[14]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[15]  Alon Lavie,et al.  Improving Syntax-Augmented Machine Translation by Coarsening the Label Set , 2013, NAACL.

[16]  Qun Liu,et al.  Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[17]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[18]  M. Dryer The Greenbergian word order correlations , 1992 .

[19]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[20]  Philipp Koehn,et al.  Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation , 2011, EMNLP.

[21]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[22]  Alexandra Birch,et al.  Reordering Metrics for MT , 2011, ACL.

[23]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[24]  Dmitriy Genzel,et al.  Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation , 2010, COLING.

[25]  Peng Xu,et al.  Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages , 2009, NAACL.

[26]  Alexandra Birch,et al.  A Quantitative Analysis of Reordering Phenomena , 2009, WMT@EACL.

[27]  mohammad Dabir moghaddam WORD ORDER TYPOLOGY OF IRANIAN LANGUAGES , 2001 .

[28]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[29]  Daniel Marcu,et al.  Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation , 2010, CL.

[30]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.