Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation

We present an accurate word alignment algorithm that heavily exploits source and target-language syntax. Using a discriminative framework and an efficient bottom-up search algorithm, we train a model of hundreds of thousands of syntactic features. Our new model (1) helps us to very accurately model syntactic transformations between languages; (2) is language-independent; and (3) with automatic feature extraction, assists system developers in obtaining good word-alignment performance off-the-shelf when tackling new language pairs. We analyze the impact of our features, describe inference under the model, and demonstrate significant alignment and translation quality improvements over already-powerful baselines trained on very large corpora. We observe translation quality improvements corresponding to 1.0 and 1.3 BLEU for Arabic-English and Chinese-English, respectively.

[1]  Salim Roukos,et al.  A Maximum Entropy Word Aligner for Arabic-English Machine Translation , 2005, HLT.

[2]  Robert C. Moore A Discriminative Framework for Bilingual Word Alignment , 2005, HLT.

[3]  Chris Callison-Burch,et al.  Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation , 2009, ACL.

[4]  Yang Liu,et al.  Improving Tree-to-Tree Translation with Packed Forests , 2009, ACL.

[5]  Anders Søgaard,et al.  Empirical Lower Bounds on Aligment Error Rates in Syntax-Based Machine Translation , 2009, SSST@HLT-NAACL.

[6]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[7]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[8]  Yang Liu,et al.  Weighted Alignment Matrices for Statistical Machine Translation , 2009, EMNLP.

[9]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10]  Yang Liu,et al.  Log-Linear Models for Word Alignment , 2005, ACL.

[11]  Kevin Knight,et al.  Using Syntax to Improve Word Alignment Precision for Syntax-Based Machine Translation , 2008, WMT@ACL.

[12]  Haizhou Li,et al.  A Tree Sequence Alignment-based Tree-to-Tree Translation Model , 2008, ACL.

[13]  David Chiang,et al.  Learning to Translate with Source and Target Syntax , 2010, ACL.

[14]  Haitao Mi,et al.  Efficient Incremental Decoding for Tree-to-String Translation , 2010, EMNLP.

[15]  John DeNero,et al.  Better Word Alignments with Supervised ITG Models , 2009, ACL.

[16]  Dan Klein,et al.  Joint Parsing and Alignment with Weakly Synchronized Grammars , 2010, NAACL.

[17]  John DeNero,et al.  Tailoring Word Alignments to Syntactic Machine Translation , 2007, ACL.

[18]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[19]  Daniel Marcu,et al.  Hierarchical Search for Word Alignment , 2010, ACL.

[20]  Ben Taskar,et al.  A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[21]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[22]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[23]  Alexander M. Fraser,et al.  Getting the Structure Right for Word Alignment: LEAF , 2007, EMNLP-CoNLL.

[24]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[25]  Phil Blunsom,et al.  Discriminative Word Alignment with Conditional Random Fields , 2006, ACL.

[26]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[27]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[28]  Colin Cherry,et al.  Soft Syntactic Constraints for Word Alignment through Discriminative Training , 2006, ACL.

[29]  Gideon S. Mann,et al.  Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.

[30]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[31]  Andreas Bode,et al.  Improved Discriminative Bilingual Word Alignment , 2006, ACL.

[32]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[33]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[34]  Taro Watanabe,et al.  Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[35]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[36]  Dan Klein,et al.  Unsupervised Syntactic Alignment with Inversion Transduction Grammars , 2010, NAACL.

[37]  John DeNero,et al.  Discriminative Modeling of Extraction Sets for Machine Translation , 2010, ACL.

[38]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[39]  Noah A. Smith,et al.  Wider Pipelines: N-Best Alignments and Parses in MT Training , 2008, AMTA.

[40]  Ulf Hermjakob,et al.  Improved Word Alignment with Statistics and Linguistic Heuristics , 2009, EMNLP.

[41]  Ben Taskar,et al.  Word Alignment via Quadratic Assignment , 2006, NAACL.