What Sentence are you Referring to and Why? Identifying Cited Sentences in Scientific Literature

In the current context of scientific information overload, text mining tools are of paramount importance for researchers who have to read scientific papers and assess their value. Current citation networks, which link papers by citation relationships (reference and citing paper), are useful to quantitatively understand the value of a piece of scientific work, however they are limited in that they do not provide information about what specific part of the reference paper the citing paper is referring to. This qualitative information is very important, for example, in the context of current community-based scientific summarization activities. In this paper, and relying on an annotated dataset of co-citation sentences, we carry out a number of experiments aimed at, given a citation sentence, automatically identify a part of a reference paper being cited. Additionally our algorithm predicts the specific reason why such reference sentence has been cited out of five possible reasons.

[1]  Awais Athar,et al.  Sentiment Analysis of Citations using Sentence Structure-Based Features , 2011, ACL.

[2]  Horacio Saggion,et al.  Dr. Inventor Framework: Extracting Structured Information from Scientific Publications , 2015, Discovery Science.

[3]  Simone Teufel,et al.  An annotation scheme for citation function , 2009, SIGDIAL Workshop.

[4]  Haixia Liu,et al.  Sentiment Analysis of Citations Using Word2vec , 2017, ArXiv.

[5]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[6]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[7]  Horacio Saggion,et al.  Trainable Citation-enhanced Summarization of Scientific Articles , 2016, BIRNDL@JCDL.

[8]  Angelo Di Iorio,et al.  Towards the Automatic Identification of the Nature of Citations , 2013, SePublica.

[9]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[10]  M. Moravcsik,et al.  Some Results on the Function and Quality of Citations , 1975 .

[11]  David M. Shotton,et al.  CiTO, the Citation Typing Ontology , 2010, J. Biomed. Semant..

[12]  Kalina Bontcheva,et al.  Architectural elements of language engineering robustness , 2002, Natural Language Engineering.

[13]  Oren Etzioni,et al.  Identifying Meaningful Citations , 2015, AAAI Workshop: Scholarly Big Data.

[14]  Lutz Bornmann,et al.  Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references , 2014, J. Assoc. Inf. Sci. Technol..

[15]  Lei Li,et al.  CIST System for CL-SciSumm 2016 Shared Task , 2016, BIRNDL@JCDL.

[16]  Min-Yen Kan,et al.  Overview of the CL-SciSumm 2016 Shared Task , 2016, BIRNDL@JCDL.

[17]  Horacio Saggion,et al.  A Multi-Layered Annotated Corpus of Scientific Papers , 2016, LREC.

[18]  Horacio Saggion,et al.  Natural Language Processing for Intelligent Access to Scientific Information , 2016, COLING.

[19]  Dapeng Wu,et al.  PolyU at CL-SciSumm 2016 , 2016, BIRNDL@JCDL.

[20]  Dragomir R. Radev,et al.  The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics , 2008, LREC.

[21]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[22]  Tadashi Nomoto NEAL: A Neurally Enhanced Approach to Linking Citation and Reference , 2016, BIRNDL@JCDL.

[23]  Horacio Saggion,et al.  SUMMA. A Robust and Adaptable Summarization Tool , 2008, TAL.

[24]  Rakesh M. Verma,et al.  University of Houston at CL-SciSumm 2016: SVMs with tree kernels and Sentence Similarity , 2016, BIRNDL@JCDL.

[25]  Dragomir R. Radev,et al.  Purpose and Polarity of Citation: Towards NLP-based Bibliometrics , 2013, NAACL.

[26]  Mohammed Bennamoun,et al.  How Well Sentence Embeddings Capture Meaning , 2015, ADCS.

[27]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[28]  Devdatt P. Dubhashi,et al.  Extractive Summarization using Continuous Vector Space Models , 2014, CVSC@EACL.

[29]  I. Spiegel-Rosing Science Studies: Bibliometric and Content Analysis , 1977 .

[30]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.