Real-Time Keyword Extraction from Conversations

We introduce a novel method to extract keywords from meeting speech in real-time. Our approach builds on the graph-of-words representation of text and leverages the k-core decomposition algorithm and properties of submodular functions. We outperform multiple baselines in a real-time scenario emulated from the AMI and ICSI meeting corpora. Evaluation is conducted against both extractive and abstractive gold standard using two standard performance metrics and a newer one based on word embeddings.

[1]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[2]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[3]  Andreas Krause,et al.  Optimizing sensing: theory and applications , 2008 .

[4]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[5]  Michalis Vazirgiannis,et al.  GoWvis: A Web Application for Graph-of-Words-based Text Visualization and Summarization , 2016, ACL.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Michalis Vazirgiannis,et al.  A Graph Degeneracy-based Approach to Keyword Extraction , 2016, EMNLP.

[8]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[9]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[10]  Maryam Habibi,et al.  Diverse Keyword Extraction from Conversations , 2013, ACL.

[11]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[12]  Vladimir Batagelj,et al.  Fast algorithms for determining (generalized) core groups in social networks , 2011, Adv. Data Anal. Classif..

[13]  Hui Lin,et al.  Graph-based submodular selection for extractive summarization , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[14]  Nick Cramer,et al.  Automatic Keyword Extraction from Individual Documents , 2010 .

[15]  Yannis Stavrakas,et al.  Degeneracy-Based Real-Time Sub-Event Detection in Twitter Stream , 2015, ICWSM.

[16]  Jean Carletta,et al.  The AMI meeting corpus , 2005 .

[17]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..