Leveraging full-text article exploration for citation analysis

Scientific articles often include in-text citations quoting from external sources. When the cited source is an article, the citation context can be analyzed by exploring the article full-text. To quickly access the key information, researchers are often interested in identifying the sections of the cited article that are most pertinent to the text surrounding the citation in the citing article. This paper first performs a data-driven analysis of the correlation between the textual content of the sections of the cited article and the text snippet where the citation is placed. The results of the correlation analysis show that the title and abstract of the cited article are likely to include content highly similar to the citing snippet. However, the subsequent sections of the paper often include cited text snippets as well. Hence, there is a need to understand the extent to which an exploration of the full-text of the cited article would be beneficial to gain insights into the citing snippet, considering also the fact that the full-text access could be restricted. To this end, we then propose a classification approach to automatically predicting whether the cited snippets in the full-text of the paper contain a significant amount of new content beyond abstract and title. The proposed approach could support researchers in leveraging full-text article exploration for citation analysis. The experiments conducted on real scientific articles show promising results: the classifier has a 90% chance to correctly distinguish between the full-text exploration and only title and abstract cases.

[1]  Luca Cagliero,et al.  Exploiting pivot words to classify and summarize discourse facets of scientific papers , 2020, Scientometrics.

[2]  Jin Xu,et al.  Automatic identification of cited text spans: a multi-classifier approach over imbalanced dataset , 2018, Scientometrics.

[3]  Doug Downey,et al.  SPECTER: Document-level Representation Learning using Citation-informed Transformers , 2020, ACL.

[4]  Eunjeong Park,et al.  A context-aware citation recommendation model with BERT and graph convolutional networks , 2019, Scientometrics.

[5]  Stephen E. Robertson,et al.  Using Terms from Citations for IR: Some First Results , 2008, ECIR.

[6]  Simone Teufel,et al.  How to Find Better Index Terms Through Citations , 2006 .

[7]  Dragomir R. Radev,et al.  NLP-driven citation analysis for scientometrics , 2016, Natural Language Engineering.

[8]  Dragomir R. Radev,et al.  Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2019) , 2019, SIGIR.

[9]  Shuai Wang,et al.  Deep learning for sentiment analysis: A survey , 2018, WIREs Data Mining Knowl. Discov..

[10]  Zhendong Niu,et al.  Multi-task learning model based on recurrent convolutional neural networks for citation sentiment and purpose classification , 2019, Neurocomputing.

[11]  Horacio Saggion,et al.  An Empirical Assessment of Citation Information in Scientific Summarization , 2016, NLDB.

[12]  Daniel Jurafsky,et al.  Measuring the Evolution of a Scientific Field through Citation Frames , 2018, TACL.

[13]  Mohammad Taher Pilehvar,et al.  Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning , 2020, Embeddings in Natural Language Processing.

[14]  Luca Cagliero,et al.  Additional reviewer assignment by means of weighted association rules , 2018 .

[15]  Michael Färber,et al.  unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata , 2020, Scientometrics.

[16]  Jungo Kasai,et al.  ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks , 2019, AAAI.

[17]  Khan Muhammad,et al.  Deep learning in citation recommendation models survey , 2020, Expert Syst. Appl..

[18]  Patricio Martínez-Barco,et al.  Citation function, polarity and influence classification , 2017, Natural Language Engineering.

[19]  Tadashi Nomoto Resolving Citation Links With Neural Networks , 2018, Front. Res. Metr. Anal..

[20]  Sophia Ananiadou,et al.  Cited text span identification for scientific summarisation using pre-trained encoders , 2020, Scientometrics.