Improved word alignments for statistical machine translation

All state of the art statistical machine translation systems and many example-based machine translation systems depend on an annotation of word-level translational correspondence between sets of parallel sentences. Such an annotation of two parallel sentences is called a "word alignment". The largest number of manually annotated word alignments currently available to the research community for any pair of languages consists of alignments for only thousands of parallel sentences, even though there are several orders of magnitude more parallel sentences available. For instance, for the task of translating Chinese news articles to English, there are currently on the order of 10 million parallel sentences. This is too many for manual alignment, so they must be automatically word aligned. Unsupervised word alignment systems generate poor quality alignments, often using statistical word alignment models developed over 10 years ago, but most recent research into improving word alignments has not led to improved translation. There are several reasons for this: (1) There is no good metric which can be used to automatically measure word alignment quality for the translation task. (2) Statistical word alignment models are based on assumptions about the structure of the problem which are incorrect. (3) It is difficult to add new sources of linguistic knowledge because many current systems must be completely reengineered for each new knowledge source. (4) Statistical models of word alignment are most often learned in an unsupervised training process which is unable to take advantage of annotated data. This thesis remedies these problems by making contributions in the following three areas: (1) We have found a new method for automatically measuring alignment quality using an unbalanced F-Measure metric. We have validated that this metric adequately measures alignment quality for the translation task. We have shown that the metric can be used to derive a loss function for discriminative training approaches, and it is useful for measuring progress during the development of new word alignment procedures. (2) We have designed a new statistical model for word alignment called LEAF, which directly models the word alignment structure as it is used for machine translation, in contrast with previous models which make unreasonable structural assumptions. (3) We have developed a semi-supervised training algorithm, the EMD algorithm, which automatically takes advantage of whatever quantity of manually annotated data can be obtained. The use of the EMD algorithm allows for the introduction of new knowledge sources with minimal effort. We have shown that these contributions improve state of the art statistical machine translation systems in experiments on challenging large data sets.

[1]  Philip Resnik,et al.  Evaluating Translational Correspondence using Annotation Projection , 2002, ACL.

[2]  Ben Taskar,et al.  Word Alignment via Quadratic Assignment , 2006, NAACL.

[3]  Pascale Fung,et al.  Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and E , 2004, EMNLP.

[4]  Alexander M. Fraser,et al.  Empirical studies in strategies for Arabic retrieval , 2002, SIGIR '02.

[5]  Daniel Gildea,et al.  Loosely Tree-Based Alignment for Machine Translation , 2003, ACL.

[6]  Alexander H. Waibel,et al.  Effective Phrase Translation Extraction from Alignment Models , 2003, ACL.

[7]  Young-Suk Lee,et al.  Morphological Analysis for Statistical Machine Translation , 2004, NAACL.

[8]  Jörg Tiedemann,et al.  Combining Clues for Word Alignment , 2003, EACL.

[9]  Necip Fazil Ayan,et al.  A Maximum Entropy Approach to Combining Word Alignments , 2006, NAACL.

[10]  Robert C. Moore Fast and accurate sentence alignment of bilingual corpora , 2002, AMTA.

[11]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[12]  Michael Gamon,et al.  Normalizing German and English inflectional morphology to improve statistical word alignment , 2004, AMTA.

[13]  Daniel Marcu,et al.  Fast Decoding and Optimal Decoding for Machine Translation , 2001, ACL.

[14]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[15]  Mauro Cettolo,et al.  Minimum error training of log-linear translation models , 2004, IWSLT.

[16]  Hermann Ney,et al.  Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information , 2004, CL.

[17]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[18]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[19]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[20]  Christof Monz,et al.  NeurAlign: Combining Word Alignments Using Neural Networks , 2005, HLT/EMNLP.

[21]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[22]  Colin Cherry,et al.  A Probability Model to Improve Word Alignment , 2003, ACL.

[23]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[24]  Alexander M. Fraser,et al.  ISI's Participation in the Romanian-English Alignment Task , 2005, ParallelText@ACL.

[25]  Daniel Marcu,et al.  Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences , 2003, NAACL.

[26]  Hermann Ney,et al.  Improved Word Alignment Using a Symmetric Lexicon Model , 2004, COLING.

[27]  Jason Eisner,et al.  Local Search with Very Large-Scale Neighborhoods for Optimal Permutations in Machine Translation , 2006 .

[28]  Djoerd Hiemstra,et al.  Disambiguation Strategies for Cross-Language Information Retrieval , 1999, ECDL.

[29]  Philipp Koehn,et al.  Learning a Translation Lexicon from Monolingual Corpora , 2002, ACL 2002.

[30]  Kenneth Ward Church,et al.  Robust Bilingual Word Alignment for Machine Aided Translation , 1993, VLC@ACL.

[31]  Robert C. Moore Improving IBM Word Alignment Model 1 , 2004, ACL.

[32]  Jörg Tiedemann,et al.  Evaluation of Word Alignment Systems , 2000, LREC.

[33]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[34]  Hermann Ney,et al.  Symmetric Word Alignments for Statistical Machine Translation , 2004, COLING.

[35]  I. Dan Melamed,et al.  Statistical Machine Translation by Parsing , 2004, ACL.

[36]  Éric Gaussier,et al.  Aligning words using matrix factorisation , 2004, ACL.

[37]  Alexander H. Waibel,et al.  Modeling with Structures in Statistical Machine translation , 1998, ACL.

[38]  Shankar Kumar,et al.  Minimum Bayes-Risk Word Alignments of Bilingual Texts , 2002, EMNLP.

[39]  Philip Resnik,et al.  Word-Based Alignment, Phrase-Based Translation: What’s the Link? , 2006, AMTA.

[40]  Alexander M. Fraser,et al.  Semi-Supervised Training for Statistical Word Alignment , 2006, ACL.

[41]  Hemanta K. Maji,et al.  Computational Complexity of Statistical Machine Translation , 2006, EACL.

[42]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[43]  Alexander M. Fraser,et al.  Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora , 2004, NAACL.

[44]  Chris Quirk,et al.  Monolingual Machine Translation for Paraphrase Generation , 2004, EMNLP.

[45]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[46]  Salim Roukos,et al.  A Maximum Entropy Word Aligner for Arabic-English Machine Translation , 2005, HLT.

[47]  Robert C. Moore A Discriminative Framework for Bilingual Word Alignment , 2005, HLT.

[48]  Necip Fazil Ayan,et al.  Going Beyond AER: An Extensive Analysis of Word Alignments and Their Impact on MT , 2006, ACL.

[49]  William Byrne,et al.  HMM Word and Phrase Alignment for Statistical Machine Translation , 2005, EMNLP 2005.

[50]  Christopher D. Manning,et al.  Extentions to HMM-based Statistical Word Alignment Models , 2002, EMNLP.

[51]  Kevin Knight,et al.  Syntactic Re-Alignment Models for Machine Translation , 2007, EMNLP.

[52]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[53]  Philip Resnik,et al.  Improved HMM Alignment Models for Languages with Scarce Resources , 2005, ParallelText@ACL.

[54]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[55]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[56]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[57]  Hemanta K. Maji,et al.  Theory of Alignment Generators and Applications to Statistical Machine Translation , 2005, IJCAI.

[58]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[59]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[60]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[61]  Hermann Ney,et al.  Improving Alignment Quality in Statistical Machine Translation Using Context-dependent Maximum Entropy Models , 2002, COLING.

[62]  Alex Pentland,et al.  Expectation Maximization for Weakly Labeled Data , 2001, ICML.

[63]  Hermann Ney,et al.  Improvements in Phrase-Based Statistical Machine Translation , 2004, NAACL.

[64]  Ted Pedersen,et al.  An Evaluation Exercise for Word Alignment , 2003, ParallelTexts@NAACL-HLT.

[65]  Daniel Marcu,et al.  Induction of Word and Phrase Alignments for Automatic Document Summarization , 2005, CL.

[66]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[67]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[68]  Aravind K. Joshi,et al.  Ranking and Reranking with Perceptron , 2005, Machine Learning.

[69]  Ben Taskar,et al.  A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[70]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[71]  Yang Liu,et al.  Log-Linear Models for Word Alignment , 2005, ACL.

[72]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[73]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[74]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[75]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[76]  Jonas Kuhn Experiments in parallel-text based grammar induction , 2004, ACL.

[77]  Andreas Bode,et al.  Improved Discriminative Bilingual Word Alignment , 2006, ACL.

[78]  William H. Press,et al.  Numerical recipes in C , 2002 .

[79]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[80]  Joel D. Martin,et al.  Word Alignment for Languages with Scarce Resources , 2005, ParallelText@ACL.