Single-Document Summarization Using Sentence Embeddings and K-Means Clustering

This paper proposes a novel method for extractive single document summarization using K-Means clustering and Sentence Embeddings. Sentence embeddings were processed by K-Means algorithm into a number of clusters depending on the required summary size. Sentences in a given cluster contained similar information, and the most appropriate sentence was picked and included in the summary for each cluster by a ridge regression sentence scoring model. Experimental ROUGE score evaluation of summaries of various lengths for the DUC 2001 dataset demonstrated the effectiveness of the approach.

[1]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[2]  Mohamed Abdel Fattah A hybrid machine learning model for multi-document summarization , 2013, Applied Intelligence.

[3]  Karthik Bangalore Mani Text Summarization using Deep Learning and Ridge Regression , 2016, ArXiv.

[4]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[5]  Sergei Vassilvitskii,et al.  Scalable K-Means by ranked retrieval , 2014, WSDM.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Vishal Gupta,et al.  Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.

[8]  Chin-Yew Lin,et al.  Looking for a Few Good Metrics: Automatic Summarization Evaluation - How Many Samples Are Enough? , 2004, NTCIR.

[9]  Rafael Dueire Lins,et al.  A Context Based Text Summarization System , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[10]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[13]  Naomie Salim,et al.  A framework for multi-document abstractive summarization based on semantic role labelling , 2015, Appl. Soft Comput..

[14]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[15]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[16]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.