A Unified Model for Soft Linguistic Reordering Constraints in Statistical Machine Translation

This paper explores a simple and effective unified framework for incorporating soft linguistic reordering constraints into a hierarchical phrase-based translation system: 1) a syntactic reordering model that explores reorderings for context free grammar rules; and 2) a semantic reordering model that focuses on the reordering of predicate-argument structures. We develop novel features based on both models and use them as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system. However, the gain achieved by the semantic reordering model is limited in the presence of the syntactic reordering model, and we therefore provide a detailed analysis of the behavior differences between the two.

[1]  Philipp Koehn,et al.  Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation , 2011, EMNLP.

[2]  Ming Zhou,et al.  A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation , 2007, ACL.

[3]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[4]  Hermann Ney,et al.  Advancements in Reordering Models for Statistical Machine Translation , 2013, ACL.

[5]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[6]  Hermann Ney,et al.  A Phrase Orientation Model for Hierarchical Machine Translation , 2013, WMT@ACL.

[7]  Bowen Zhou,et al.  Two-Neighbor Orientation Model with Cross-Boundary Global Contexts , 2013, ACL.

[8]  Jimmy J. Lin,et al.  Mr. MIRA: Open-Source Large-Margin Structured Learning on MapReduce , 2013, ACL.

[9]  Khalil Sima'an,et al.  Learning Hierarchical Translation Structure with Linguistic Annotations , 2011, ACL.

[10]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[11]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[12]  Peng Xu,et al.  Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages , 2009, NAACL.

[13]  Hwee Tou Ng,et al.  Joint Syntactic and Semantic Parsing of Chinese , 2010, ACL.

[14]  David Chiang,et al.  Learning to Translate with Source and Target Syntax , 2010, ACL.

[15]  Chao Wang,et al.  Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[16]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[17]  Nianwen Xue,et al.  Adding semantic roles to the Chinese Treebank , 2009, Natural Language Engineering.

[18]  Pascale Fung,et al.  Semantic Roles for SMT: A Hybrid Two-Pass Model , 2009, NAACL.

[19]  Ding Liu,et al.  Semantic Role Features for Machine Translation , 2010, COLING.

[20]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[21]  Haizhou Li,et al.  Topological Ordering of Function Words in Hierarchical Phrase-based Translation , 2009, ACL/IJCNLP.

[22]  Stephan Vogel,et al.  Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation , 2013, ACL.

[23]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[24]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[25]  Karthik Visweswariah,et al.  Syntax Based Reordering with Automatically Derived Rules for Improved Statistical Machine Translation , 2010, COLING.

[26]  Slav Petrov,et al.  Source-Side Classifier Preordering for Machine Translation , 2013, EMNLP.

[27]  Rabih Zbib,et al.  Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation , 2013, EMNLP.

[28]  Kevin Duh,et al.  Extracting Pre-ordering Rules from Predicate-Argument Structures , 2011, IJCNLP.

[29]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[30]  Yu Zhou,et al.  Handling Ambiguities of Bilingual Predicate-Argument Structures for Statistical Machine Translation , 2013, ACL.

[31]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[32]  Philip Resnik,et al.  Modeling Syntactic and Semantic Structures in Hierarchical Phrase-based Translation , 2013, HLT-NAACL.

[33]  Daniel Jurafsky,et al.  A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.

[34]  Nenghai Yu,et al.  A Ranking-based Approach to Word Reordering for Statistical Machine Translation , 2012, ACL.

[35]  Khalil Sima'an,et al.  Context-Sensitive Syntactic Source-Reordering by Statistical Transduction , 2011, IJCNLP.

[36]  Colin Cherry Improved Reordering for Phrase-Based Translation using Sparse Features , 2013, HLT-NAACL.

[37]  Sophia Ananiadou,et al.  Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty , 2009, ACL.

[38]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[39]  Dmitriy Genzel,et al.  Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation , 2010, COLING.

[40]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[41]  Philip Resnik,et al.  Soft Syntactic Constraints for Hierarchical Phrased-Based Translation , 2008, ACL.

[42]  Xiaoqiang Luo,et al.  Constituent Reordering and Syntax Models for English-to-Japanese Statistical Machine Translation , 2010, COLING.

[43]  Niyu Ge A Direct Syntax-Driven Reordering Model for Phrase-Based Machine Translation , 2010, HLT-NAACL.

[44]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[45]  Haizhou Li,et al.  Modeling the Translation of Predicate-Argument Structure for SMT , 2012, ACL.