论文信息 - SPEDE: Probabilistic Edit Distance Metrics for MT Evaluation

SPEDE: Probabilistic Edit Distance Metrics for MT Evaluation

This paper describes Stanford University's submission to the Shared Evaluation Task of WMT 2012. Our proposed metric (SPEDE) computes probabilistic edit distance as predictions of translation quality. We learn weighted edit distance in a probabilistic finite state machine (pFSM) model, where state transitions correspond to edit operations. While standard edit distance models cannot capture long-distance word swapping or cross alignments, we rectify these shortcomings using a novel pushdown automaton extension of the pFSM model. Our models are trained in a regression framework, and can easily incorporate a rich set of linguistic features. Evaluated on two different prediction tasks across a diverse set of datasets, our methods achieve state-of-the-art correlation with human judgments.

Christopher D. Manning | Mengqiu Wang | Mengqiu Wang

[1] Philipp Koehn,et al. Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[2] Andy Way,et al. Evaluating machine translation with LFG dependencies , 2007, Machine Translation.

[3] Yifan He,et al. The DCU Dependency-Based Metric in WMT-MetricsMATR 2010 , 2010, WMT@ACL.

[4] Francisco Casacuberta,et al. Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Yin Chen,et al. A Re-examination on Features in Regression Based Approach to Automatic MT Evaluation , 2008, ACL.

[6] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[7] Philipp Koehn,et al. Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[8] Alon Lavie,et al. Extending the METEOR Machine Translation Evaluation Metric to the Phrase Level , 2010, NAACL.

[9] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[11] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.