论文信息 - A Semi-automatic Evaluation Scheme

A Semi-automatic Evaluation Scheme

The evaluations of many NLP applications share a two-step procedure: the processing and aggregation of reference data to yield a gold-standard data set, and the subsequent comparison with peer system results. Traditionally, the former has been performed by human annotators and the latter has been conducted either manually or automatically. Moving toward a fully automated evaluation procedure, we propose a novel semi-automatic evaluation scheme where the reference and peer data are automatically nuggetized and a carefully designed annotation procedure is followed for subsequent comparisons. High interannotator agreements from both reference and peer annotations show that machineproduced nuggets are informative and can be utilized in evaluation environments. In addition, standardizing the nugget creation process affords us the opportunity to look beyond surface-level phrasal differences toward semantic equivalency.

Eduard Hovy | Liang Zhou

[1] Alexey Radul,et al. Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements , 2006, NAACL.

[2] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[3] Jimmy J. Lin,et al. Automatically Evaluating Answers to Definition Questions , 2005, HLT.

[4] Ani Nenkova,et al. Automation of Summary Evaluation by the Pyramid Method , 2005 .

[5] Simone Teufel,et al. Examining the consensus between human summaries: initial experiments with factoid analysis , 2003, HLT-NAACL 2003.

[6] Dragos Stefan Munteanu,et al. ParaEval: Using Paraphrases to Evaluate Summaries Automatically , 2006, NAACL.

[7] Eduard Hovy,et al. Manual and automatic evaluation of summaries , 2002, ACL 2002.

[8] Ani Nenkova,et al. Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.