Evaluation of Sentence Selection for Speech Summarization

In the last several years, a number of papers have addressed the area of automatic speech summarization. Many of them have applied evaluation metrics adapted from those used in speech recognition research, rather than from those used in text summarization. We consider whether ASR-inspired evaluation metrics produce different results than those taken from text summarization, and why. We evaluate various standard summarizers as well as our own systems on a subset of the SWITCHBOARD spoken dialogue dataset with both kinds of metrics. We find a statistically significant departure between the two classes in their relative rank of these systems. Our preliminary conclusion is that considerably greater caution must be exercised when using ASR-based measures than we have witnessed to date in the speech summarization literature.

[1]  Robin Valenza SUMMARISATION OF SPOKEN AUDIO THROUGH INFORMATION EXTRACTION , 1999 .

[2]  Alexander H. Waibel,et al.  Minimizing Word Error Rate in Textual Summaries of Spoken Language , 2000, ANLP.

[3]  Konstantinos Koumpis,et al.  Automatic Voicemail Summarisation for Mobile Messaging , 2002 .

[4]  Sadaoki Furui,et al.  A new approach to automatic speech summarization , 2003, IEEE Trans. Multim..

[5]  Wai Lam,et al.  Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[6]  Julia Hirschberg,et al.  Automatic summarization of broadcast news using structural features , 2003, INTERSPEECH.

[7]  Sadaoki Furui,et al.  Speech-to-Speech and Speech-to-Text Summarization , 2003 .

[8]  Sadaoki Furui,et al.  Automatic speech summarization based on sentence extraction and compaction , 2002, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Heidi Christensen,et al.  From Text Summarisation to Style-Specific Summarisation for Broadcast News , 2004, ECIR.

[10]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[11]  Iryna Gurevych,et al.  Semantic Similarity Applied to Spoken Dialogue Summarization , 2004, COLING.

[12]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[13]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[14]  Dragomir R. Radev,et al.  Single-document and multi-document summary evaluation using Relative Utility , 2007 .