论文信息 - A Dataset for Assessing Machine Translation Evaluation Metrics - 字舞流文

A Dataset for Assessing Machine Translation Evaluation Metrics

We describe a dataset containing 16,000 translations produced by four machine translation systems and manually annotated for quality by professional translators. This dataset can be used in a range of tasks assessing machine translation evaluation metrics, from basic correlation analysis to training and test of machine learning-based metrics. By providing a standard dataset for such tasks, we hope to encourage the development of better MT evaluation metrics.

Lucia Specia | Marc Dymetman | Nicola Cancedda | Lucia Specia | Nicola Cancedda | Marc Dymetman

[1] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[2] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[3] Philipp Koehn,et al. Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[4] Philipp Koehn,et al. Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[5] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[6] Alex Kulesza,et al. Confidence Estimation for Machine Translation , 2004, COLING.

[7] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.

[8] Matti Kääriäinen. Sinuhe - Statistical Machine Translation using a Globally Trained Conditional Exponential Family Translation Model , 2009, EMNLP.

[9] Philipp Koehn,et al. Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[10] Marc Dymetman,et al. Translating with Non-contiguous Phrases , 2005, HLT.

[11] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .

[12] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13] Roland Kuhn,et al. PORTAGE: with Smoothed Phrase Tables and Segment Choice Models , 2006, WMT@HLT-NAACL.

[14] Chin-Yew Lin,et al. ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[15] S. Wold,et al. The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[16] Philipp Koehn,et al. (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[17] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[18] Lluís Màrquez i Villodre,et al. A Smorgasbord of Features for Automatic MT Evaluation , 2008, WMT@ACL.

[19] Nello Cristianini,et al. Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[20] Chris Quirk,et al. Training a Sentence-Level Machine Translation Confidence Measure , 2004, LREC.

[21] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.