论文信息 - Evaluation of machine translation and its evaluation

Evaluation of machine translation and its evaluation

Evaluation of MT evaluation measures is limited by inconsistent human judgment data. Nonetheless, machine translation can be evaluated using the well-known measures precision, recall, and their average, the F-measure. The unigram-based F-measure has significantly higher correlation with human judgments than recently proposed alternatives. More importantly, this standard measure has an intuitive graphical interpretation, which can facilitate insight into how MT systems might be improved. The relevant software is publicly available from http://nlp.cs.nyu.edu/GTM/.

Joseph P. Turian | Joseph Turian | Luke Shea | I. D. Melamed | Luke Shea

[1] H. Thompson. Thompson NEW DIRECTIONS : Automatic Evaluation of Translation Quality : Outline of Methodology and Report on Pilot Experiment , 1991 .

[2] Chris Brew,et al. Automatic Evaluation of Computer Generated Text: A Progress Report on the TextEval Project , 1994, HLT.

[3] I. Dan Melamed,et al. Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons , 1995, VLC@ACL.

[4] Martin Rajman,et al. Automatically predicting MT systems rankings compatible with fluency, adequacy and informativeness scores , 2001, MTSUMMIT.

[5] Thomas H. Cormen,et al. Introduction to algorithms [2nd ed.] , 2001 .

[6] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[7] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[8] I. Dan Melamed,et al. Precision and Recall of Machine Translation , 2003, NAACL.