Summarizing Speech by Contextual Reinforcement of Important Passages

We explore the use of contextual information of the same type, i.e., speech transcriptions, to assess the relevant content of a single information source. Our proposal consists in the use of topic-related additional information sources to contextualize the information of the main input source, improving the estimation of the most important passages. We analyse the impact of using as additional information both the full topic-related stories and just the passages from those stories that are closer to the passages of the input source to be summarized. A multi-document summarization framework, Latent Semantic Analysis (LSA), provides the means to assess the relevant content. To minimize the influence of speech-related problems, we explore several term weighting strategies. Evaluation is performed using an information-theoretic evaluation measure, the Jensen-Shannon divergence, that does not need reference summaries.

[1]  W. Bruce Croft,et al.  Relevance Models in Information Retrieval , 2003 .

[2]  João Paulo da Silva Neto,et al.  A Prototype System for Selective Dissemination of Broadcast News in European Portuguese , 2007, EURASIP J. Adv. Signal Process..

[3]  Berlin Chen,et al.  Extractive speech summarization - from the view of decision theory , 2010, INTERSPEECH.

[4]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[5]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[6]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[7]  Brigitte Endres-Niggemeyer,et al.  SimSum: an empirically founded simulation of summarizing , 2000, Inf. Process. Manag..

[8]  María Pinto Molina Documentary abstracting: toward a methodological model , 1995 .

[9]  Sadaoki Furui,et al.  Topic and Stylistic Adaptation for Speech Summarisation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Ani Nenkova,et al.  Automatically Evaluating Content Selection in Summarization without Human Models , 2009, EMNLP.

[11]  Ricardo Ribeiro,et al.  Using prior knowledge to assess relevance in speech summarization , 2008, 2008 IEEE Spoken Language Technology Workshop.

[12]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[13]  Ricardo Ribeiro,et al.  Mixed-Source Multi-Document Speech-to-Text Summarization , 2008, COLING 2008.

[14]  Gerald Penn,et al.  A Critical Reassessment of Evaluation Baselines for Speech Summarization , 2008, ACL.

[15]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[16]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[17]  Julia Hirschberg,et al.  Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization , 2005, INTERSPEECH.

[18]  Hsin-Min Wang,et al.  A Probabilistic Generative Framework for Extractive Broadcast News Speech Summarization , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Berlin Chen,et al.  A Risk Minimization Framework for Extractive Speech Summarization , 2010, ACL.

[21]  Xiaojun Wan,et al.  CollabSum: exploiting multiple document clustering for collaborative single document summarizations , 2007, SIGIR.

[22]  Pascale Fung,et al.  Extractive Speech Summarization Using Shallow Rhetorical Structure Modeling , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Eric SanJuan,et al.  Multilingual Summarization Evaluation without Human Models , 2010, COLING.

[24]  Berlin Chen,et al.  Chinese Spoken Document Summarization Using Probabilistic Latent Topical Information , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[25]  Berlin Chen,et al.  Word Topical Mixture Models for Dynamic Language Model Adaptation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[26]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[27]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[28]  Jianfeng Gao,et al.  An Information-Theoretic Approach to Automatic Evaluation of Summaries , 2006, NAACL.

[29]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[30]  Johanna D. Moore,et al.  Incorporating Speaker and Discourse Features into Speech Summarization , 2006, NAACL.