Unsupervised document summarization using pre-trained sentence embeddings and graph centrality

This paper describes our submission for the LongSumm task in SDP 2021. We propose a method for incorporating sentence embeddings produced by deep language models into extractive summarization techniques based on graph centrality in an unsupervised manner.The proposed method is simple, fast, can summarize any kind of document of any size and can satisfy any length constraints for the summaries produced. The method offers competitive performance to more sophisticated supervised methods and can serve as a proxy for abstractive summarization techniques

[1]  Shirui Pan,et al.  Monash-Summ@LongSumm 20 SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline , 2020, SDP.

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[4]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[5]  Zhe Gan,et al.  Discourse-Aware Neural Extractive Model for Text Summarization , 2019, ArXiv.

[6]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[7]  Yao Zhao,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[10]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[11]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[12]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[13]  Gholamreza Haffari,et al.  SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression , 2020, SIGIR.

[14]  Edward Gibson,et al.  Paragraph-, Word-, and Coherence-based Approaches to Sentence Ranking: A Comparison of Algorithm and Human Performance , 2004, ACL.

[15]  G Salton,et al.  Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts , 1994, Science.

[16]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[17]  Guy Lev,et al.  TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks , 2019, ACL.

[18]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[19]  K. G. Srinivasa,et al.  SentenceRank — A graph based approach to summarize text , 2014, The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014).

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[22]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[23]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[24]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[25]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[26]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[27]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[28]  Shirui Pan,et al.  SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline , 2020, ArXiv.

[29]  Wei Liu,et al.  CIST@CL-SciSumm 2020, LongSumm 2020: Automatic Scientific Document Summarization , 2020, SDP.