Maximum-entropy word alignment and posterior-based phrase extraction for machine translation

One of the fundamental assumptions in statistical machine translation (SMT) is that the correspondence between a sentence and its translation can be explained in terms of an alignment between their words. Such alignment information is typically not observed in the parallel corpora used to build the phrase table of an SMT system. Therefore, it is customary to estimate a probabilistic model of the assumed hidden word alignment, which is then used to extract bilingual phrase pairs. In standard extraction heuristics, the alignment model is under-exploited as the only information used from the posterior distribution is the Viterbi best alignment. This is due to the high computational complexity of the IBM models, which are the de facto standard for computing these alignments. Note that these models have other limitations, including their asymmetry and their inability to integrate rich, feature-based, descriptions. We argue that refining the word alignment model in a discriminative maximum-entropy framework substantially improves the alignment quality. We also show that these improved alignments combined with efficient and accurate computation of the link posterior distributions can also improve the overall translation performance, especially when applying posterior-based extraction methods.

[1]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[2]  Shankar Kumar,et al.  Minimum Bayes-Risk Word Alignments of Bilingual Texts , 2002, EMNLP.

[3]  Rafael E. Banchs,et al.  Discriminative Alignment Training without Annotated Data for Machine Translation , 2007, HLT-NAACL.

[4]  Anders Søgaard Can inversion transduction grammars generate hand alignments , 2010, EAMT.

[5]  Necip Fazil Ayan,et al.  A Maximum Entropy Approach to Combining Word Alignments , 2006, NAACL.

[6]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[7]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[8]  Nizar Habash,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Arabic Preprocessing Schemes for Statistical Machine Translation , 2006 .

[9]  Hermann Ney,et al.  AER: do we need to “improve” our alignments? , 2006, IWSLT.

[10]  Chris Dyer,et al.  Using a maximum entropy model to build segmentation lattices for MT , 2009, NAACL.

[11]  Ben Taskar,et al.  Learning Tractable Word Alignment Models with Complex Constraints , 2010, CL.

[12]  Mirella Lapata,et al.  Proceedings of ACL-08: HLT , 2008 .

[13]  Yanjun Ma,et al.  Tracking relevant alignment characteristics for machine translation , 2009 .

[14]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[15]  Philipp Koehn,et al.  Constraining the Phrase-Based, Joint Probability Statistical Translation Model , 2006, WMT@HLT-NAACL.

[16]  John DeNero,et al.  Discriminative Modeling of Extraction Sets for Machine Translation , 2010, ACL.

[17]  Daniel Marcu,et al.  Improved word alignments for statistical machine translation , 2007 .

[18]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[19]  Nizar Habash Arabic Natural Language Processing , 2008 .

[20]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[21]  Noah A. Smith,et al.  Wider Pipelines: N-Best Alignments and Parses in MT Training , 2008, AMTA.

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  Kevin Small,et al.  All links are not the same: evaluating word alignments for statistical machine translation , 2007, MTSUMMIT.

[24]  Tiejun Zhao,et al.  Bilingual Phrase Extraction from N-Best Alignments , 2006, First International Conference on Innovative Computing, Information and Control - Volume I (ICICIC'06).

[25]  François Yvon,et al.  Designing an Improved Discriminative Word Aligner , 2011, Int. J. Comput. Linguistics Appl..

[26]  John DeNero,et al.  Why Generative Phrase Models Underperform Surface Heuristics , 2006, WMT@HLT-NAACL.

[27]  Nizar Habash,et al.  Arabic Morphological Representations for Machine Translation , 2007 .

[28]  Haitao Mi,et al.  Forest-based Translation Rule Extraction , 2008, EMNLP.

[29]  Alon Lavie,et al.  Unsupervised Word Alignment with Arbitrary Features , 2011, ACL.

[30]  William J. Byrne,et al.  HMM Word and Phrase Alignment for Statistical Machine Translation , 2005, HLT.

[31]  William J. Byrne,et al.  Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities , 2010, EMNLP.

[32]  Alexandre Allauzen,et al.  Discriminative Weighted Alignment Matrices For Statistical Machine Translation , 2011, EAMT.

[33]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[34]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[35]  Phil Blunsom,et al.  Discriminative Word Alignment with Conditional Random Fields , 2006, ACL.

[36]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[37]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[38]  V. Deineko,et al.  The Quadratic Assignment Problem: Theory and Algorithms , 1998 .

[39]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[40]  Omar Zaidan,et al.  Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems , 2009, Prague Bull. Math. Linguistics.

[41]  Qin Gao,et al.  Reassessment of the role of phrase extraction in pbsmt , 2009 .

[42]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[43]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[44]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[45]  Ben Taskar,et al.  Word Alignment via Quadratic Assignment , 2006, NAACL.

[46]  Salim Roukos,et al.  A Maximum Entropy Word Aligner for Arabic-English Machine Translation , 2005, HLT.

[47]  Robert C. Moore A Discriminative Framework for Bilingual Word Alignment , 2005, HLT.

[48]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[49]  William Byrne,et al.  HMM Word and Phrase Alignment for Statistical Machine Translation , 2005, EMNLP 2005.

[50]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[51]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[52]  Jan Niehues,et al.  Discriminative Word Alignment via Alignment Matrix Modeling , 2008, WMT@ACL.

[53]  Daniel Marcu,et al.  Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy , 2007, EMNLP.

[54]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[55]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[56]  Ben Taskar,et al.  A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[57]  Yang Liu,et al.  Discriminative Word Alignment by Linear Modeling , 2010, CL.

[58]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[59]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[60]  Wang Ling,et al.  Towards a general and extensible phrase-extraction algorithm , 2010, IWSLT.

[61]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[62]  Yang Liu,et al.  Weighted Alignment Matrices for Statistical Machine Translation , 2009, EMNLP.

[63]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[64]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[65]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[66]  Elliott Macklovitch,et al.  Methods and Practical Issues in Evaluating Alignment Techniques , 2002 .

[67]  Mark Hopkins,et al.  Extraction Programs: A Unified Approach to Translation Rule Extraction , 2011, WMT@EMNLP.

[68]  Yanjun Ma,et al.  What types of word alignment improve statistical machine translation? , 2012, Machine Translation.

[69]  Nizar Habash,et al.  Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking , 2008, ACL.

[70]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[71]  Daniel Marcu,et al.  Hierarchical Search for Word Alignment , 2010, ACL.

[72]  Alexander M. Fraser,et al.  Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation , 2007, CL.

[73]  Mirella Lapata,et al.  Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics , 1999, ACL 1999.

[74]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[75]  Ben Taskar,et al.  Better Alignments = Better Translations? , 2008, ACL.

[76]  Daniel Jurafsky,et al.  Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks , 2004, NAACL.

[77]  Colin Cherry,et al.  A Probability Model to Improve Word Alignment , 2003, ACL.

[78]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[79]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[80]  Dekai Wu,et al.  Empirical lower bounds on translation unit error rate for the full class of inversion transduction grammars , 2009, IWPT.

[81]  John DeNero,et al.  The Complexity of Phrase Alignment Problems , 2008, ACL.

[82]  S. Mansoor Sarwar,et al.  Engineering Quicksort , 1996, Comput. Lang..

[83]  Guillaume Wisniewski,et al.  Refining Word Alignment with Discriminative Training , 2010, AMTA.