Probabilistic Finite State Machines for Regression-based MT Evaluation

Accurate and robust metrics for automatic evaluation are key to the development of statistical machine translation (MT) systems. We first introduce a new regression model that uses a probabilistic finite state machine (pFSM) to compute weighted edit distance as predictions of translation quality. We also propose a novel pushdown automaton extension of the pFSM model for modeling word swapping and cross alignments that cannot be captured by standard edit distance models. Our models can easily incorporate a rich set of linguistic features, and automatically learn their weights, eliminating the need for ad-hoc parameter tuning. Our methods achieve state-of-the-art correlation with human judgments on two different prediction tasks across a diverse set of standard evaluations (NIST OpenMT06, 08; WMT06-08).

[1]  Hwee Tou Ng,et al.  Better Evaluation Metrics Lead to Better Machine Translation , 2011, EMNLP.

[2]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[3]  Alexandra Birch,et al.  Reordering Metrics for MT , 2011, ACL.

[4]  Yaser Al-Onaizan,et al.  Translation with Finite-State Devices , 1998, AMTA.

[5]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[6]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[7]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[8]  Javier Esparza,et al.  Quantitative analysis of probabilistic pushdown automata: expectations and variances , 2005, 20th Annual IEEE Symposium on Logic in Computer Science (LICS' 05).

[9]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[10]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[11]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[12]  Christopher D. Manning,et al.  Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering , 2010, COLING.

[13]  Martin Emms,et al.  On Stochastic Tree Distances and Their Training via Expectation-Maximisation , 2012, ICPRAM.

[14]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[15]  Liang Zhou,et al.  Re-evaluating Machine Translation Results with Paraphrase Support , 2006, EMNLP.

[16]  Alex Kulesza,et al.  A learning approach to improving sentence-level MT evaluation , 2004 .

[17]  Rebecca Hwa,et al.  Regression for Sentence-Level MT Evaluation with Pseudo References , 2007, ACL.

[18]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[19]  Andy Way,et al.  Evaluating machine translation with LFG dependencies , 2007, Machine Translation.

[20]  Yifan He,et al.  The DCU Dependency-Based Metric in WMT-MetricsMATR 2010 , 2010, WMT@ACL.

[21]  Jason Eisner,et al.  Parameter Estimation for Probabilistic Finite-State Transducers , 2002, ACL.

[22]  Andrew McCallum,et al.  A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance , 2005, UAI.

[23]  Dekai Wu,et al.  MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles , 2011, ACL.

[24]  Adam Lopez,et al.  Proceedings of the Seventh Workshop on Statistical Machine Translation , 2012 .

[25]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[26]  Daniel Jurafsky,et al.  Robust Machine Translation Evaluation with Entailment Features , 2009, ACL.

[27]  Mark Johnson,et al.  Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.

[28]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[29]  Rebecca Hwa,et al.  A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation , 2007, ACL.

[30]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[31]  Shankar Kumar,et al.  A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation , 2003, NAACL.

[32]  Ely Porat,et al.  Approximate Swap and Mismatch Edit Distance , 2007, SPIRE.

[33]  Yin Chen,et al.  A Re-examination on Features in Regression Based Approach to Automatic MT Evaluation , 2008, ACL.

[34]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[35]  Marc Sebban,et al.  Learning probabilistic models of tree edit distance , 2008, Pattern Recognit..

[36]  Yifan He,et al.  Improving the Objective Function in Minimum Error Rate Training , 2009, MTSUMMIT.

[37]  Joakim Nivre,et al.  Word Alignment with Stochastic Bracketing Linear Inversion Transduction Grammar , 2010, HLT-NAACL.

[38]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[39]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[40]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[42]  Alon Lavie,et al.  Extending the METEOR Machine Translation Evaluation Metric to the Phrase Level , 2010, NAACL.

[43]  Alexandra Birch,et al.  A Quantitative Analysis of Reordering Phenomena , 2009, WMT@EACL.

[44]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[45]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[46]  Dekai Wu,et al.  Linear Transduction Grammars and Zipper Finite-State Transducers , 2011, RANLP.

[47]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[48]  Ding Liu,et al.  Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[49]  Lucas Antiqueira,et al.  Using metrics from complex networks to evaluate machine translation , 2011 .

[50]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[51]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.