A Supervised Approach to Extractive Summarisation of Scientific Papers

Automatic summarisation is a popular approach to reduce a document to its main arguments. Recent research in the area has focused on neural approaches to summarisation, which can be very data-hungry. However, few large datasets exist and none for the traditionally popular domain of scientific publications, which opens up challenging research avenues centered on encoding large, complex documents. In this paper, we introduce a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and show straightforward ways of extending it further. We develop models on the dataset making use of both neural sentence encoding and traditionally used summarisation features and show that models which encode sentences as well as their local and global context perform best, significantly outperforming well-established baseline methods.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  Nazli Goharian,et al.  Scientific Article Summarization Using Citation-Context and Article’s Discourse Structure , 2015, EMNLP.

[3]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[4]  Ani Nenkova,et al.  Detecting (Un)Important Content for Single-Document News Summarization , 2017, EACL.

[5]  Isabelle Augenstein,et al.  SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications , 2017, *SEMEVAL.

[6]  Christopher D. Manning,et al.  Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers , 2011, IJCNLP.

[7]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[8]  Horacio Saggion,et al.  Trainable Citation-enhanced Summarization of Scientific Articles , 2016, BIRNDL@JCDL.

[9]  Noah A. Smith,et al.  Extractive Summarization by Maximizing Semantic Volume , 2015, EMNLP.

[10]  Simone Teufel,et al.  Unsupervised learning of rhetorical structure with un-topic models , 2014, COLING.

[11]  M. B. Wieling,et al.  Sentence-based Summarization of Scientific Documents The design and implementation of an online available automatic summarizer , 2005 .

[12]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[13]  Vasileios Hatzivassiloglou,et al.  Event-Based Extractive Summarization , 2004 .

[14]  Horacio Saggion,et al.  Knowledge Extraction and Modeling from Scientific Publications , 2016 .

[15]  Mark Last,et al.  Using Machine Learning Methods and Linguistic Features in Single-Document Extractive Summarization , 2016, DMNLP@PKDD/ECML.

[16]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[17]  Paul Over,et al.  The DUC summarization evaluations , 2002 .

[18]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[19]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[20]  U. Berkeley Exploring Content Models for Multi-Document Summarization , 2018 .

[21]  G Salton,et al.  Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts , 1994, Science.

[22]  J. Steinberger,et al.  Using Latent Semantic Analysis in Text Summarization and Summary Evaluation , 2004 .

[23]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[24]  Erwin Marsi,et al.  Extraction and generalisation of variables from scientific publications , 2015, EMNLP.

[25]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[26]  Y. Radhika,et al.  Extractive Text Summarization Using Modified Weighing and Sentence Symmetric Feature Methods , 2015 .

[27]  Hayato Kobayashi,et al.  Summarization Based on Embedding Distributions , 2015, EMNLP.

[28]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[29]  Houfeng Wang,et al.  Learning Summary Prior Representation for Extractive Summarization , 2015, ACL.

[30]  Thomas Demeester,et al.  Supervised Keyphrase Extraction as Positive Unlabeled Learning , 2016, EMNLP.

[31]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[32]  Mark Last,et al.  MUSEEC: A Multilingual Text Summarization Tool , 2016, ACL.

[33]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[34]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[35]  Ani Nenkova,et al.  A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[36]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[37]  Min-Yen Kan,et al.  Overview of the CL-SciSumm 2016 Shared Task , 2016, BIRNDL@JCDL.

[38]  Isabelle Augenstein,et al.  Multi-Task Learning of Keyphrase Boundary Classification , 2017, ACL.

[39]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[40]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[41]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[42]  Hoa Trang Dang,et al.  Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[43]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[44]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[45]  Mirella Lapata,et al.  Neural Summarization by Extracting Sentences and Words , 2016, ACL.

[46]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[47]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[48]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .