论文信息 - Single-Document Summarization Using Sentence Embeddings and K-Means Clustering

Single-Document Summarization Using Sentence Embeddings and K-Means Clustering

This paper proposes a novel method for extractive single document summarization using K-Means clustering and Sentence Embeddings. Sentence embeddings were processed by K-Means algorithm into a number of clusters depending on the required summary size. Sentences in a given cluster contained similar information, and the most appropriate sentence was picked and included in the summary for each cluster by a ridge regression sentence scoring model. Experimental ROUGE score evaluation of summaries of various lengths for the DUC 2001 dataset demonstrated the effectiveness of the approach.

[1] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[2] Mohamed Abdel Fattah. A hybrid machine learning model for multi-document summarization , 2013, Applied Intelligence.

[3] Karthik Bangalore Mani. Text Summarization using Deep Learning and Ridge Regression , 2016, ArXiv.

[4] Wiebke Wagner,et al. Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[5] Sergei Vassilvitskii,et al. Scalable K-Means by ranked retrieval , 2014, WSDM.

[6] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7] Vishal Gupta,et al. Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.

[8] Chin-Yew Lin,et al. Looking for a Few Good Metrics: Automatic Summarization Evaluation - How Many Samples Are Enough? , 2004, NTCIR.

[9] Rafael Dueire Lins,et al. A Context Based Text Summarization System , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[10] Charu C. Aggarwal,et al. On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[11] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12] Petr Sojka,et al. Software Framework for Topic Modelling with Large Corpora , 2010 .

[13] Naomie Salim,et al. A framework for multi-document abstractive summarization based on semantic role labelling , 2015, Appl. Soft Comput..

[14] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[15] Hans Peter Luhn,et al. The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[16] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.