Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation

Automatic word alignment plays a critical role in statistical machine translation. Unfortunately, the relationship between alignment quality and statistical machine translation performance has not been well understood. In the recent literature, the alignment task has frequently been decoupled from the translation task and assumptions have been made about measuring alignment quality for machine translation which, it turns out, are not justified. In particular, none of the tens of papers published over the last five years has shown that significant decreases in alignment error rate (AER) result in significant increases in translation performance. This paper explains this state of affairs and presents steps towards measuring alignment quality in a way which is predictive of statistical machine translation performance.

[1]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[2]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[3]  I. Dan Melamed,et al.  Manual Annotation of Translational Equivalence: The Blinker Project , 1998, ArXiv.

[4]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[5]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[8]  Ted Pedersen,et al.  An Evaluation Exercise for Word Alignment , 2003, ParallelTexts@NAACL-HLT.

[9]  Colin Cherry,et al.  A Probability Model to Improve Word Alignment , 2003, ACL.

[10]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[11]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[12]  Éric Gaussier,et al.  Aligning words using matrix factorisation , 2004, ACL.

[13]  Ben Taskar,et al.  A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[14]  Joel D. Martin,et al.  Word Alignment for Languages with Scarce Resources , 2005, ParallelText@ACL.

[15]  Christof Monz,et al.  NeurAlign: Combining Word Alignments Using Neural Networks , 2005, HLT/EMNLP.

[16]  Salim Roukos,et al.  A Maximum Entropy Word Aligner for Arabic-English Machine Translation , 2005, HLT.

[17]  Alexander M. Fraser,et al.  ISI's Participation in the Romanian-English Alignment Task , 2005, ParallelText@ACL.

[18]  Yang Liu,et al.  Log-Linear Models for Word Alignment , 2005, ACL.

[19]  Robert C. Moore A Discriminative Framework for Bilingual Word Alignment , 2005, HLT.

[20]  Alexander M. Fraser,et al.  Semi-Supervised Training for Statistical Word Alignment , 2006, ACL.

[21]  Andreas Bode,et al.  Improved Discriminative Bilingual Word Alignment , 2006, ACL.

[22]  Ben Taskar,et al.  Word Alignment via Quadratic Assignment , 2006, NAACL.

[23]  Mirella Lapata,et al.  Constructing Corpora for the Development and Evaluation of Paraphrase Systems , 2008, CL.