Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System?
暂无分享,去创建一个
[1] Robert Tibshirani,et al. Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .
[2] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .
[3] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[4] Andrei Popescu-Belis. An experiment in comparative evaluation: humans vs. computers , 2003, MTSUMMIT.
[5] Christopher Culy,et al. The limits of n-gram translation evaluation metrics , 2003, MTSUMMIT.