论文信息 - Semantic translation error rate for evaluating translation systems

Semantic translation error rate for evaluating translation systems

In this paper, we introduce a new metric which we call the semantic translation error rate, or STER, for evaluating the performance of machine translation systems. STER is based on the previously published translation error rate (TER) (Snover et al., 2006) and METEOR (Banerjee and Lavie, 2005) metrics. Specifically, STER extends TER in two ways: first, by incorporating word equivalence measures (WordNet and Porter stemming) standardly used by METEOR, and second, by disallowing alignments of concept words to non-concept words (aka stop words). We show how these features make STER alignments better suited for human-driven analysis than standard TER. We also present experimental results that show that STER is better correlated to human judgments than TER. Finally, we compare STER to METEOR, and illustrate that METEOR scores computed using the STER alignments have similar statistical properties to METEOR scores computed using METEOR alignments.

[1] Rohit Prasad,et al. A hybrid phrase-based/statistical speech translation system , 2006, INTERSPEECH.

[2] Vic Barnett,et al. Sample Survey Principles and Methods , 1991 .

[3] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4] Bowen Zhou,et al. Two-way speech-to-speech translation on handheld devices , 2004, INTERSPEECH.

[5] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[7] Kristin Precoda,et al. Speech translation for low-resource languages: the case of Pashto , 2005, INTERSPEECH.

[8] Peter Willett,et al. Readings in information retrieval , 1997 .

[9] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[10] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[11] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.