论文信息 - Evaluating Information Content by Factoid Analysis: Human annotation and stability

Evaluating Information Content by Factoid Analysis: Human annotation and stability

We present a new approach to intrinsic summary evaluation, based on initial experiments in van Halteren and Teufel (2003), which combines two novel aspects: comparison of information content (rather than string similarity) in gold standard and system summary, measured in shared atomic information units which we call factoids, and comparison to more than one gold standard summary (in our data: 20 and 50 summaries respectively). In this paper, we show that factoid annotation is highly reproducible, introduce a weighted factoid score, estimate how many summaries are required for stable system rankings, and show that the factoid scores cannot be sufficiently approximated by unigrams and the DUC information overlap measure.

Hans van Halteren | Simone Teufel | H. V. Halteren | Simone Teufel

[1] Ellen M. Voorhees,et al. Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.

[2] Inderjeet Mani,et al. The Tipster Summac Text Summarization Evaluation , 1999, EACL.

[3] Eduard Hovy,et al. Manual and automatic evaluation of summaries , 2002, ACL 2002.

[4] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5] Kathleen R. McKeown,et al. Summarization Evaluation Methods: Experiments and Analysis , 1998 .

[6] Dragomir R. Radev,et al. Summarization evaluation using relative utility , 2003, CIKM '03.

[7] Mark T. Maybury,et al. Automatic Summarization , 2002, Computational Linguistics.

[8] Gustave J. Rath,et al. The formation of abstracts by the selection of sentences , 1961 .

[9] Ani Nenkova,et al. Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[10] Karen Spärck Jones. Automatic summarising: factors and directions , 1998, ArXiv.

[11] Simone Teufel,et al. Examining the consensus between human summaries: initial experiments with factoid analysis , 2003, HLT-NAACL 2003.