论文信息 - Probabilistic Finite State Machines for Regression-based MT Evaluation - 字舞流文

Probabilistic Finite State Machines for Regression-based MT Evaluation

Accurate and robust metrics for automatic evaluation are key to the development of statistical machine translation (MT) systems. We first introduce a new regression model that uses a probabilistic finite state machine (pFSM) to compute weighted edit distance as predictions of translation quality. We also propose a novel pushdown automaton extension of the pFSM model for modeling word swapping and cross alignments that cannot be captured by standard edit distance models. Our models can easily incorporate a rich set of linguistic features, and automatically learn their weights, eliminating the need for ad-hoc parameter tuning. Our methods achieve state-of-the-art correlation with human judgments on two different prediction tasks across a diverse set of standard evaluations (NIST OpenMT06, 08; WMT06-08).

Christopher D. Manning | Mengqiu Wang | Mengqiu Wang

[1] Hwee Tou Ng,et al. Better Evaluation Metrics Lead to Better Machine Translation , 2011, EMNLP.

[2] Philipp Koehn,et al. Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[3] Alexandra Birch,et al. Reordering Metrics for MT , 2011, ACL.

[4] Yaser Al-Onaizan,et al. Translation with Finite-State Devices , 1998, AMTA.

[5] Koby Crammer,et al. Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[6] Philipp Koehn,et al. Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[7] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[8] Javier Esparza,et al. Quantitative analysis of probabilistic pushdown automata: expectations and variances , 2005, 20th Annual IEEE Symposium on Logic in Computer Science (LICS' 05).

[9] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[10] Ronald L. Rivest,et al. Introduction to Algorithms, Second Edition , 2001 .

[11] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[12] Christopher D. Manning,et al. Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering , 2010, COLING.

[13] Martin Emms,et al. On Stochastic Tree Distances and Their Training via Expectation-Maximisation , 2012, ICPRAM.

[14] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[15] Liang Zhou,et al. Re-evaluating Machine Translation Results with Paraphrase Support , 2006, EMNLP.

[16] Alex Kulesza,et al. A learning approach to improving sentence-level MT evaluation , 2004 .

[17] Rebecca Hwa,et al. Regression for Sentence-Level MT Evaluation with Pseudo References , 2007, ACL.

[18] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[19] Andy Way,et al. Evaluating machine translation with LFG dependencies , 2007, Machine Translation.

[20] Yifan He,et al. The DCU Dependency-Based Metric in WMT-MetricsMATR 2010 , 2010, WMT@ACL.

[21] Jason Eisner,et al. Parameter Estimation for Probabilistic Finite-State Transducers , 2002, ACL.

[22] Andrew McCallum,et al. A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance , 2005, UAI.

[23] Dekai Wu,et al. MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles , 2011, ACL.

[24] Adam Lopez,et al. Proceedings of the Seventh Workshop on Statistical Machine Translation , 2012 .

[25] George A. Miller,et al. Introduction to WordNet: An On-line Lexical Database , 1990 .

[26] Daniel Jurafsky,et al. Robust Machine Translation Evaluation with Entailment Features , 2009, ACL.

[27] Mark Johnson,et al. Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.

[28] Nitin Madnani,et al. Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[29] Rebecca Hwa,et al. A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation , 2007, ACL.

[30] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[31] Shankar Kumar,et al. A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation , 2003, NAACL.

[32] Ely Porat,et al. Approximate Swap and Mismatch Edit Distance , 2007, SPIRE.

[33] Yin Chen,et al. A Re-examination on Features in Regression Based Approach to Automatic MT Evaluation , 2008, ACL.

[34] Philipp Koehn,et al. Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[35] Marc Sebban,et al. Learning probabilistic models of tree edit distance , 2008, Pattern Recognit..

[36] Yifan He,et al. Improving the Objective Function in Minimum Error Rate Training , 2009, MTSUMMIT.

[37] Joakim Nivre,et al. Word Alignment with Stochastic Bracketing Linear Inversion Transduction Grammar , 2010, HLT-NAACL.

[38] Philipp Koehn,et al. Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[39] Philipp Koehn,et al. Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[40] Francisco Casacuberta,et al. Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[42] Alon Lavie,et al. Extending the METEOR Machine Translation Evaluation Metric to the Phrase Level , 2010, NAACL.

[43] Alexandra Birch,et al. A Quantitative Analysis of Reordering Phenomena , 2009, WMT@EACL.

[44] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .

[45] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[46] Dekai Wu,et al. Linear Transduction Grammars and Zipper Finite-State Transducers , 2011, RANLP.

[47] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[48] Ding Liu,et al. Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[49] Lucas Antiqueira,et al. Using metrics from complex networks to evaluate machine translation , 2011 .

[50] John DeNero,et al. Painless Unsupervised Learning with Features , 2010, NAACL.

[51] Hermann Ney,et al. HMM-Based Word Alignment in Statistical Translation , 1996, COLING.