论文信息 - A Study of Translation Edit Rate with Targeted Human Annotation

A Study of Translation Edit Rate with Targeted Human Annotation

We examine a new, intuitive measure for evaluating machine-translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments. Translation Edit Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation. We show that the single-reference variant of TER correlates as well with human judgments of MT quality as the four-reference variant of BLEU. We also define a human-targeted TER (or HTER) and show that it yields higher correlations with human judgments than BLEU—even when BLEU is given human-targeted references. Our results indicate that HTER correlates with human judgments better than HMETEOR and that the four-reference variants of TER and HTER correlate with human judgments as well as—or better than—a second human judgment does.

J. Makhoul | Linnea Micciulla | M. Snover | B. Dorr | Richard M. Schwartz

[1] Jimmy J. Lin,et al. A Paraphrase-Based Approach to Machine Translation Evaluation , 2005 .

[2] Ehud Reiter,et al. Evaluating an NLG System using Post-Editing , 2005, IJCAI.

[3] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[4] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5] Dana Shapira,et al. Edit distance with move operations , 2002, J. Discrete Algorithms.

[6] G. Doddington. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics , 2002 .

[7] Inderjeet Mani,et al. SUMMAC: a text summarization evaluation , 2002, Natural Language Engineering.

[8] Sergei Nirenburg,et al. Three Heads are Better than One , 1994, ANLP.

[9] Ishwar Chander,et al. Automated Postediting of Documents , 1994, AAAI.

[10] Joseph P. Turian,et al. Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.

[11] Hermann Ney,et al. An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[12] Margaret King,et al. Evaluating natural language processing systems , 1996, CACM.