Summarization evaluation for text and speech: issues and approaches

This paper surveys current text and speech summarization evaluation approaches. It discusses advantages and disadvantages of these, with the goal of identifying summarization techniques most suitable to speech summarization. Precision/recall schemes, as well as summary accuracy measures which incorporate weightings based on multiple human decisions, are suggested as particularly suitable in evaluating speech summaries.

[1]  Gerald Penn,et al.  Evaluation of Sentence Selection for Speech Summarization , 2005 .

[2]  Steve Whittaker,et al.  A meeting browser evaluation test , 2005, CHI Extended Abstracts.

[3]  Kathleen R. McKeown,et al.  Columbia multi-document summarization : Approach and evaluation , 2001 .

[4]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[5]  Johanna D. Moore,et al.  Incorporating Speaker and Discourse Features into Speech Summarization , 2006, NAACL.

[6]  Sadaoki Furui,et al.  Sentence-extractive automatic speech summarization and evaluation techniques , 2006, Speech Commun..

[7]  Paul Over,et al.  The Effects of Human Variation in DUC Summarization Evaluation , 2004 .

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Wai Lam,et al.  Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[10]  Julia Hirschberg,et al.  Summarizing Speech Without Text Using Hidden Markov Models , 2006, NAACL.

[11]  Johanna D. Moore,et al.  Evaluating Automatic Summaries of Meeting Recordings , 2005, IEEvaluation@ACL.

[12]  Alexander H. Waibel,et al.  Minimizing Word Error Rate in Textual Summaries of Spoken Language , 2000, ANLP.

[13]  Kathleen R. McKeown,et al.  Summarization Evaluation Methods: Experiments and Analysis , 1998 .

[14]  Mirella Lapata,et al.  Automatic Evaluation of Text Coherence: Models and Representations , 2005, IJCAI.

[15]  Julia Hirschberg,et al.  Do Summaries Help? A Task-Based Evaluation of Multi-Document Summarization , 2005 .

[16]  Michel Galley,et al.  A Skip-Chain Conditional Random Field for Ranking Meeting Utterances by Importance , 2006, EMNLP.

[17]  Kathleen R. McKeown,et al.  Understanding the process of multi-document summarization: content selection, rewriting and evaluation , 2006 .

[18]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[19]  Robert L. Donaway,et al.  A Comparison of Rankings Produced by Summarization Evaluation Measures , 2000 .

[20]  Alexander Hauptmann,et al.  Summarization of Broadcast News Video through Link Analysis of Named Entities , 2005 .

[21]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[22]  Iryna Gurevych,et al.  Semantic Similarity Applied to Spoken Dialogue Summarization , 2004, COLING.

[23]  Ani Nenkova,et al.  Automatic Text Summarization of Newswire: Lessons Learned from the Document Understanding Conference , 2005, AAAI.

[24]  Sadaoki Furui,et al.  Evaluation method for automatic speech summarization , 2003, INTERSPEECH.

[25]  Gustave J. Rath,et al.  The formation of abstracts by the selection of sentences , 1961 .

[26]  Inderjeet Mani,et al.  SUMMAC: a text summarization evaluation , 2002, Natural Language Engineering.

[27]  Aaron E. Rosenberg,et al.  SCANMail: browsing and searching speech data by content , 2001, INTERSPEECH.

[28]  Dragomir R. Radev,et al.  Single-document and multi-document summary evaluation using Relative Utility , 2007 .

[29]  Konstantinos Koumpis,et al.  Automatic summarization of voicemail messages using lexical and prosodic features , 2005, TSLP.