LaSTUS-TALN+INCO @ CL-SciSumm 2019

In this paper we present several systems developed to participate in the 4th Computational Linguistics Scientific Document Summarization Shared challenge which addresses the problem of summarizing a scientific paper using information from its citation network (i.e., the papers that cite the given paper). Given a cluster of scientific documents where one is a reference paper (RP) and the remaining documents are papers citing the reference, two tasks are proposed: (i) to identify which sentences in the reference paper are being cited and why they are cited, and (ii) to produce a citation-based summary of the reference paper using the information in the cluster. Our systems are based on both supervised (LSTM and convolutional neural networks) and unsupervised techniques using word embedding representations and features computed from the linguistic and semantic analysis of the documents.

[1]  Min-Yen Kan,et al.  The CL-SciSumm Shared Task 2018: Results and Key Insights , 2019, BIRNDL@SIGIR.

[2]  Dragomir R. Radev,et al.  The computational linguistics summarization pilot task , 2014 .

[3]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[4]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[5]  Horacio Saggion,et al.  Trainable Citation-enhanced Summarization of Scientific Articles , 2016, BIRNDL@JCDL.

[6]  Dragomir R. Radev,et al.  Overview and Results: CL-SciSumm Shared Task 2019 , 2019, BIRNDL@SIGIR.

[7]  Min-Yen Kan,et al.  Insights from CL-SciSumm 2016: the faceted scientific document summarization Shared Task , 2017, International Journal on Digital Libraries.

[8]  Horacio Saggion,et al.  LaSTUS/TALN+INCO @ CL-SciSumm 2018 - Using Regression and Convolutions for Cross-document Semantic Linking and Summarization of Scholarly Literature , 2018, BIRNDL@SIGIR.

[9]  Horacio Saggion,et al.  LaSTUS/TALN @ CLSciSumm-17: Cross-document Sentence Matching and Scientific Text Summarization Systems , 2017, BIRNDL@SIGIR.

[10]  Horacio Saggion,et al.  Generating Indicative-Informative Summaries with SumUM , 2002, Computational Linguistics.

[11]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[12]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[13]  Dragomir R. Radev,et al.  Purpose and Polarity of Citation: Towards NLP-based Bibliometrics , 2013, NAACL.

[14]  Ralph Grishman,et al.  Relation Extraction: Perspective from Convolutional Neural Networks , 2015, VS@HLT-NAACL.

[15]  Chris D. Paice,et al.  The identification of important concepts in highly structured technical papers , 1993, SIGIR.

[16]  Jin Xu,et al.  NJUST @ CLSciSumm-18 , 2018, BIRNDL@SIGIR.

[17]  Dragomir R. Radev,et al.  Scientific Paper Summarization Using Citation Summary Networks , 2008, COLING.

[18]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[19]  Tadashi Nomoto Resolving Citation Links With Neural Networks , 2018, Front. Res. Metr. Anal..

[20]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[21]  Roberto Navigli,et al.  Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities , 2016, Artif. Intell..

[22]  Dragomir R. Radev,et al.  Identifying Non-Explicit Citing Sentences for Citation-Based Summarization. , 2010, ACL.

[23]  Horacio Saggion,et al.  What Sentence are you Referring to and Why? Identifying Cited Sentences in Scientific Literature , 2017, RANLP.

[24]  Horacio Saggion,et al.  Concept Identification and Presentation in the Context of Technical Text Summarization , 2000 .