Combining Graph Degeneracy and Submodularity for Unsupervised Extractive Summarization

We present a fully unsupervised, extractive text summarization system that leverages a submodularity framework introduced by past research. The framework allows summaries to be generated in a greedy way while preserving near-optimal performance guarantees. Our main contribution is the novel coverage reward term of the objective function optimized by the greedy algorithm. This component builds on the graph-of-words representation of text and the k-core decomposition algorithm to assign meaningful scores to words. We evaluate our approach on the AMI and ICSI meeting speech corpora, and on the DUC2001 news corpus. We reach state-of-the-art performance on all datasets. Results indicate that our method is particularly well-suited to the meeting domain.

[1]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[2]  Michalis Vazirgiannis,et al.  Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction , 2015, ECIR.

[3]  Dilek Z. Hakkani-Tür,et al.  A keyphrase based approach to interactive meeting summarization , 2008, 2008 IEEE Spoken Language Technology Workshop.

[4]  Vladimir Batagelj,et al.  Generalized Cores , 2002, ArXiv.

[5]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[7]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[8]  Giuseppe Carenini,et al.  Abstractive Meeting Summarization with Entailment and Fusion , 2013, ENLG.

[9]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[10]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[11]  Yannis Stavrakas,et al.  Degeneracy-Based Real-Time Sub-Event Detection in Twitter Stream , 2015, ICWSM.

[12]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[13]  Michalis Vazirgiannis,et al.  A Graph Degeneracy-based Approach to Keyword Extraction , 2016, EMNLP.

[14]  Dilek Z. Hakkani-Tür,et al.  The ICSI/UTD Summarization System at TAC 2009 , 2009, TAC.

[15]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.

[16]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[17]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[18]  Jean Carletta,et al.  The AMI meeting corpus , 2005 .

[19]  Michalis Vazirgiannis,et al.  GoWvis: A Web Application for Graph-of-Words-based Text Visualization and Summarization , 2016, ACL.

[20]  Hui Lin,et al.  Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[21]  Dilek Z. Hakkani-Tür,et al.  Packing the meeting summarization knapsack , 2008, INTERSPEECH.

[22]  Lev Muchnik,et al.  Identifying influential spreaders in complex networks , 2010, 1001.5285.

[23]  Michalis Vazirgiannis,et al.  Graph-of-word and TW-IDF: new approach to ad hoc IR , 2013, CIKM.

[24]  Hui Lin,et al.  Graph-based submodular selection for extractive summarization , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[25]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[26]  Michalis Vazirgiannis,et al.  Real-Time Keyword Extraction from Conversations , 2017, EACL.

[27]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[28]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[29]  Feifan Liu,et al.  Correlation between ROUGE and Human Evaluation of Extractive Meeting Summaries , 2008, ACL.

[30]  Andreas Krause,et al.  Efficient Sensor Placement Optimization for Securing Large Water Distribution Networks , 2008 .

[31]  Julia Hirschberg,et al.  From text to speech summarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[32]  Dilek Z. Hakkani-Tür,et al.  Clusterrank: a graph based method for meeting summarization , 2009, INTERSPEECH.