Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization

We introduce a novel graph-based framework for abstractive meeting speech summarization that is fully unsupervised and does not rely on any annotations. Our work combines the strengths of multiple recent approaches while addressing their weaknesses. Moreover, we leverage recent advances in word embeddings and graph degeneracy applied to NLP to take exterior semantic knowledge into account, and to design custom diversity and informativeness measures. Experiments on the AMI and ICSI corpus show that our system improves on the state-of-the-art. Code and data are publicly available, and our system can be interactively tested.

[1]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Jiawei Han,et al.  Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[3]  Sangwook Kim,et al.  Identifying and ranking influential spreaders in complex networks by neighborhood coreness , 2014 .

[4]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.

[5]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[6]  Vladimir Batagelj,et al.  Generalized Cores , 2002, ArXiv.

[7]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[8]  Michalis Vazirgiannis,et al.  Real-Time Keyword Extraction from Conversations , 2017, EACL.

[9]  J. Bilmes,et al.  Submodularity in natural language processing: algorithms and applications , 2012 .

[10]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[11]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[12]  Hui Lin,et al.  Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[13]  Michalis Vazirgiannis,et al.  A Graph Degeneracy-based Approach to Keyword Extraction , 2016, EMNLP.

[14]  Michalis Vazirgiannis,et al.  Graph-of-word and TW-IDF: new approach to ad hoc IR , 2013, CIKM.

[15]  Michalis Vazirgiannis,et al.  Combining Graph Degeneracy and Submodularity for Unsupervised Extractive Summarization , 2017, NFiS@EMNLP.

[16]  Dilek Z. Hakkani-Tür,et al.  Long story short - Global unsupervised models for keyphrase based meeting summarization , 2010, Speech Commun..

[17]  Dilek Z. Hakkani-Tür,et al.  Packing the meeting summarization knapsack , 2008, INTERSPEECH.

[18]  Giuseppe Carenini,et al.  Using the Omega Index for Evaluating Abstractive Community Detection , 2012, EvalMetrics@NAACL-HLT.

[19]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[20]  Lev Muchnik,et al.  Identifying influential spreaders in complex networks , 2010, 1001.5285.

[21]  Michalis Vazirgiannis,et al.  GoWvis: A Web Application for Graph-of-Words-based Text Visualization and Summarization , 2016, ACL.

[22]  Florian Boudin,et al.  Keyphrase Extraction for N-best Reranking in Multi-Sentence Compression , 2013, HLT-NAACL.

[23]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[24]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[25]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[26]  Rui Wang Corpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors , 2015 .

[27]  Jean Carletta,et al.  The AMI meeting corpus , 2005 .

[28]  Dilek Z. Hakkani-Tür,et al.  Clusterrank: a graph based method for meeting summarization , 2009, INTERSPEECH.

[29]  Giuseppe Carenini,et al.  Abstractive Meeting Summarization with Entailment and Fusion , 2013, ENLG.