论文信息 - Evaluating DUC 2004 Tasks with the QARLA Framework

Evaluating DUC 2004 Tasks with the QARLA Framework

This papers reports the application of the QARLA evaluation framework to the DUC 2004 testbed (tasks 2 and 5). Our experiment addresses two issues: how well QARLA evaluation measures correlate with human judgements, and what additional insights can be provided by the QARLA framework to the DUC evaluation exercises.

Julio Gonzalo | M. Felisa Verdejo | Anselmo Peñas | Enrique Amigó

[1] Julio Gonzalo,et al. An Empirical Study of Information Synthesis Task , 2004, ACL.

[2] Julio Gonzalo,et al. QARLA: A Framework for the Evaluation of Text Summarization Systems , 2005, ACL.

[3] Christopher Culy,et al. The limits of n-gram translation evaluation metrics , 2003, MTSUMMIT.

[4] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[5] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[6] Deborah A. Coughlin,et al. Correlating automated and human assessments of machine translation quality , 2003, MTSUMMIT.