A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies

Citation analysis is one of the most frequently used methods in research evaluation. We are seeing significant growth in citation analysis through bibliometric metadata, primarily due to the availability of citation databases such as the Web of Science, Scopus, Google Scholar, Microsoft Academic, and Dimensions. Due to better access to full-text publication corpora in recent years, information scientists have gone far beyond traditional bibliometrics by tapping into advancements in full-text data processing techniques to measure the impact of scientific publications in contextual terms. This has led to technical developments in citation context and content analysis, citation classifications, citation sentiment analysis, citation summarisation, and citation-based recommendation. This article aims to narratively review the studies on these developments. Its primary focus is on publications that have used natural language processing and machine learning techniques to analyse citations.

[1]  Muhammad Tanvir Afzal,et al.  Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge , 2019, Scientometrics.

[2]  Lubna Zafar,et al.  Citation Context Analysis using Word-Graph , 2019, 2019 2nd International Conference on Communication, Computing and Digital systems (C-CODE).

[3]  Katherine W. McCain,et al.  Citation context analysis and aging patterns of journal articles in molecular genetics , 1989, Scientometrics.

[4]  Robert E. Mercer,et al.  The Frequency of Hedging Cues in Citation Contexts in Scientific Writing , 2004, Canadian Conference on AI.

[5]  Chaomei Chen,et al.  The Recurrence of Citations within a Scientific Article , 2015, ISSI.

[6]  Xiaozhong Liu,et al.  A review of citation recommendation: from textual content to enriched context , 2020, Scientometrics.

[7]  Guo Zhang,et al.  Content‐based citation analysis: The next generation of citation analysis , 2014, J. Assoc. Inf. Sci. Technol..

[8]  Henry G. Small,et al.  On the shoulders of Robert Merton: Towards a normative theory of citation , 2004, Scientometrics.

[9]  Kevin W. Boyack,et al.  Characterizing in-text citations in scientific articles: A large-scale analysis , 2017, J. Informetrics.

[10]  Sophia Ananiadou,et al.  Mining opinion polarity relations of citations , 2007 .

[11]  Dejun Mu,et al.  A LSTM Based Model for Personalized Context-Aware Citation Recommendation , 2018, IEEE Access.

[12]  T. Judge,et al.  What Causes a Management Article to be Cited—Article, Author, or Journal? , 2007 .

[13]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[14]  Lutz Bornmann,et al.  Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis , 2017, Scientometrics.

[15]  Ichiro Sakata,et al.  Cross-Domain Academic Paper Recommendation by Semantic Linkage Approach Using Text Analysis and Recurrent Neural Networks , 2017, 2017 Portland International Conference on Management of Engineering and Technology (PICMET).

[16]  John M. Conroy,et al.  Vector Space Models for Scientific Document Summarization , 2015, VS@HLT-NAACL.

[17]  Chaomei Chen,et al.  Where are citations located in the body of scientific articles? A study of the distributions of citation locations , 2013, J. Informetrics.

[18]  Zhendong Niu,et al.  Multi-task learning model based on recurrent convolutional neural networks for citation sentiment and purpose classification , 2019, Neurocomputing.

[19]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[20]  R. Merton Priorities in scientific discovery: A chapter in the sociology of science. , 1957 .

[21]  Mika V. Mäntylä,et al.  The evolution of sentiment analysis - A review of research topics, venues, and top cited papers , 2016, Comput. Sci. Rev..

[22]  Zhoujun Li,et al.  Mining and modeling linkage information from citation context for improving biomedical literature retrieval , 2011, Inf. Process. Manag..

[23]  Hui Fang A theoretical model of scientific impact based on citations , 2016 .

[24]  Dragomir R. Radev,et al.  Purpose and Polarity of Citation: Towards NLP-based Bibliometrics , 2013, NAACL.

[25]  Marc Bertin,et al.  The context of multiple in-text references and their signification , 2017, International Journal on Digital Libraries.

[26]  Charles Oppenheim,et al.  Highly cited old papers and the reasons why they continue to be cited , 1978, J. Am. Soc. Inf. Sci..

[27]  Oren Etzioni,et al.  Identifying Meaningful Citations , 2015, AAAI Workshop: Scholarly Big Data.

[28]  Michael H. MacRoberts,et al.  Problems of citation analysis: A critical review , 1989, JASIS.

[29]  I. Spiegel-Rosing Science Studies: Bibliometric and Content Analysis , 1977 .

[30]  Mengxiong Liu,et al.  Progress in Documentation the Complexities of citation Practice: a Review of citation studies , 1993, J. Documentation.

[31]  Robert E. Mercer,et al.  Towards an Automated Citation Classifier , 2000, Canadian Conference on AI.

[32]  Zhigang Hu,et al.  Understanding multiply mentioned references , 2017, J. Informetrics.

[33]  Peter Haddawy,et al.  Automatic Classification of Algorithm Citation Functions in Scientific Literature , 2020, IEEE Transactions on Knowledge and Data Engineering.

[34]  Chandra G. Prabha,et al.  Some aspects of citation behavior: A pilot study in business administration , 1983, J. Am. Soc. Inf. Sci..

[35]  Dragomir R. Radev,et al.  Identifying Non-Explicit Citing Sentences for Citation-Based Summarization. , 2010, ACL.

[36]  Simone Teufel,et al.  Detection of Implicit Citations for Sentiment Detection , 2012, ACL 2012.

[37]  Patricia A. Hooten Frequency and Functional Use of Cited Documents in Information Science. , 1991 .

[38]  Shashank Agarwal,et al.  Automatically classifying the role of citations in biomedical articles. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[39]  Petr Knoth,et al.  Incidental or influential? - A decade of using text-mining for citation function classification , 2017, ISSI.

[40]  Tarun Kumar,et al.  Identifying citing sentences in research papers using supervised learning , 2010, 2010 International Conference on Information Retrieval & Knowledge Management (CAMP).

[41]  Dragomir R. Radev,et al.  NLP-driven citation analysis for scientometrics , 2016, Natural Language Engineering.

[42]  William R. Hersh,et al.  Reducing workload in systematic review preparation using automated citation classification. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[43]  Dragomir R. Radev,et al.  Overview and Results: CL-SciSumm Shared Task 2019 , 2019, BIRNDL@SIGIR.

[44]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[45]  Saeed-Ul Hassan,et al.  Mining the Context of Citations in Scientific Publications , 2018, ICADL.

[46]  Angelo Di Iorio,et al.  Identifying Citation Contexts: a Review of Strategies and Goals , 2018, CLiC-it.

[47]  R. Merton,et al.  The Sociology of Science: Theoretical and Empirical Investigations , 1975, Journal for the Scientific Study of Religion.

[48]  Dain Kaplan,et al.  Automatic Extraction of Citation Contexts for Research Paper Summarization: A Coreference-chain based Approach , 2009 .

[49]  Riaz Ahmad,et al.  CAD: an algorithm for citation-anchors detection in research papers , 2018, Scientometrics.

[50]  Nazli Goharian,et al.  Scientific document summarization via citation contextualization and scientific discourse , 2017, International Journal on Digital Libraries.

[51]  Yifan He,et al.  Towards Fine-grained Citation Function Classification , 2013, RANLP.

[52]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[53]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[54]  Dragomir R. Radev,et al.  Scientific Paper Summarization Using Citation Summary Networks , 2008, COLING.

[55]  Henry G. Small,et al.  Citation context analysis of a co-citation cluster: Recombinant-DNA , 1980, Scientometrics.

[56]  Henry G. Small,et al.  Characterizing highly cited method and non-method papers using citation contexts: The role of uncertainty , 2018, J. Informetrics.

[57]  Vasudeva Varma,et al.  Scientific Article Recommendation by using Distributed Representations of Text and Graph , 2017, WWW.

[58]  Peter Erlandson,et al.  A taxonomy of motives to cite , 2014, Social studies of science.

[59]  Wenyi Huang,et al.  Recommending citations: translating papers into references , 2012, CIKM.

[60]  Haluk Bingol,et al.  Context sensitive article ranking with citation context analysis , 2015, Scientometrics.

[61]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[62]  Roman Kern,et al.  Identifying Referenced Text in Scientific Publications by Summarisation and Classification Techniques , 2016, BIRNDL@JCDL.

[63]  Saeed-Ul Hassan,et al.  Deep context of citations using machine-learning models in scholarly full-text articles , 2018, Scientometrics.

[64]  Muhammad Abdul Qadir,et al.  Lessons Learned: The Complexity of Accurate Identification of in-Text Citations , 2015, Int. Arab J. Inf. Technol..

[65]  Dominika Tkaczyk,et al.  Extracting Contextual Information from Scientific Literature Using CERMINE System , 2015, SemWebEval@ESWC.

[66]  Sophia Ananiadou,et al.  Identification of research hypotheses and new knowledge from scientific literature , 2018, BMC Medical Informatics and Decision Making.

[67]  Vishal Gupta,et al.  Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.

[68]  Mary Elizabeth Stevens,et al.  Statistical Association Methods for Mechanized Documentation. , 1967 .

[69]  Daryl E. Chubin,et al.  Content Analysis of References: Adjunct or Alternative to Citation Counting? , 1975 .

[70]  Zhendong Niu,et al.  Semi-Automatic Annotation for Citation Function Classification , 2018, 2018 International Conference on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO).

[71]  Gerard Salton,et al.  Associative Document Retrieval Techniques Using Bibliographic Information , 1963, JACM.

[72]  Marc H. Anderson How Can We Know What We Think Until We See What We Said?: A Citation and Citation Context Analysis of Karl Weick’s The Social Psychology of Organizing , 2006 .

[73]  Jie Tang,et al.  A Discriminative Approach to Topic-Based Citation Recommendation , 2009, PAKDD.

[74]  C. O. Frost The Use of Citations in Literary Research: A Preliminary Classification of Citation Functions , 1979, The Library Quarterly.

[75]  Dragomir R. Radev,et al.  Generating Extractive Summaries of Scientific Paradigms , 2013, J. Artif. Intell. Res..

[76]  M. Miller,et al.  Citations, contexts, and humanistic discourse: Toward automatic extraction and classification , 2014, Lit. Linguistic Comput..

[77]  Umut Al,et al.  A content-based citation analysis study based on text categorization , 2017, Scientometrics.

[78]  Ying Ding,et al.  The distribution of references across texts: Some implications for citation analysis , 2013, J. Informetrics.

[79]  Tadashi Nomoto NEAL: A Neurally Enhanced Approach to Linking Citation and Reference , 2016, BIRNDL@JCDL.

[80]  Dragomir R. Radev,et al.  Blind men and elephants: What do citation summaries tell us about a research article? , 2008 .

[81]  C. D. Hurt Conceptual citation differences in science, technology, and social sciences literature , 1987, Inf. Process. Manag..

[82]  Blaise Cronin,et al.  The citation process: The role and significance of citations in scientific communication , 1984 .

[83]  Simone Teufel,et al.  Automatic classification of citation function , 2006, EMNLP.

[84]  Lutz Bornmann,et al.  What do citation counts measure? A review of studies on citing behavior , 2008, J. Documentation.

[85]  Achim G. Hoffmann,et al.  Towards topic-based summarization for interactive document viewing , 2003, K-CAP '03.

[86]  Achim G. Hoffmann,et al.  LEXA: Building knowledge bases for automatic legal citation classification , 2015, Expert Syst. Appl..

[87]  José M. Gómez,et al.  Survey about citation context analysis: Tasks, techniques, and resources , 2015, Natural Language Engineering.

[88]  Ulrich Schäfer,et al.  Ensemble-style Self-training on Citation Classification , 2011, IJCNLP.

[89]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[90]  Peter Haddawy,et al.  Identifying Important Citations Using Contextual Information from Full Text , 2017, 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[91]  Saeed-Ul Hassan,et al.  Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications , 2019, Scientometrics.

[92]  Hinrich Schütze,et al.  Towards a Generic and Flexible Citation Classifier Based on a Faceted Classification Scheme , 2012, COLING.

[93]  Min Song,et al.  Content-based author co-citation analysis , 2014, J. Informetrics.

[94]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[95]  Rakesh M. Verma,et al.  Extractive Summarization: Limits, Compression, Generalized Model and Heuristics , 2018 .

[96]  Dapeng Wu,et al.  PolyU at CL-SciSumm 2016 , 2016, BIRNDL@JCDL.

[97]  Daniel Lemire,et al.  Measuring academic influence: Not all citations are equal , 2015, J. Assoc. Inf. Sci. Technol..

[98]  Manabu Okumura,et al.  Towards Multi-paper Summarization Using Reference Information , 1999, IJCAI.

[99]  Vincent Larivière,et al.  The invariant distribution of references in scientific articles , 2016, J. Assoc. Inf. Sci. Technol..

[100]  Simone Teufel,et al.  Whose Idea Was This, and Why Does it Matter? Attributing Scientific Work to Citations , 2007, HLT-NAACL.

[101]  Myriam A. Hernández,et al.  Sentiment, Polarity and Function Analysis in Bibliometrics: A Review , 2015 .

[102]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[103]  Mingyang Wang,et al.  Sentiment Classification based on Linguistic Patterns in Citation Context , 2019, Current Science.

[104]  Daniel P. Dabney,et al.  Automatic recognition of distinguishing negative indirect history language in judicial opinions , 2001, CIKM '01.

[105]  Patricia A. Hooten Frequency and functional use of cited documents in information science , 1991, J. Am. Soc. Inf. Sci..

[106]  Daniel Kifer,et al.  Context-aware citation recommendation , 2010, WWW '10.

[107]  Henry Voos,et al.  Are All Citations Equal? Or, Did We Op. Cit. Your Idem?. , 1976 .

[108]  Jungo Kasai,et al.  ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks , 2019, AAAI.

[109]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[110]  Patrice Lopez,et al.  GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications , 2009, ECDL.

[111]  Simone Teufel Argumentative Zoning for Improved Citation Indexing , 2006, Computing Attitude and Affect in Text.

[112]  Lutz Bornmann,et al.  What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018 , 2019, Scientometrics.

[113]  Azadeh Shakery,et al.  Citance-based retrieval and summarization using IR and machine learning , 2018, Scientometrics.

[114]  Muhammad Tanvir Afzal,et al.  Automated citation sentiment analysis using high order n-grams: a preliminary investigation , 2018 .

[115]  Sophia Ananiadou,et al.  Enriching news events with meta-knowledge information , 2016, Language Resources and Evaluation.

[116]  Stephen E. Robertson,et al.  Comparing citation contexts for information retrieval , 2008, CIKM '08.

[117]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[118]  Ludo Waltman,et al.  Citation-based clustering of publications using CitNetExplorer and VOSviewer , 2017, Scientometrics.

[119]  Niket Tandon,et al.  Citation Context Sentiment Analysis for Structured Summarization of Research Papers , 2012 .

[120]  M. A. Safer,et al.  The Psychology of Referencing in Psychology Journal Articles , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.

[121]  Susan Bonzi,et al.  Characteristics of a Literature as Predictors of Relatedness Between Cited and Citing Works , 2007, J. Am. Soc. Inf. Sci..

[122]  James Bailey,et al.  Improving MeSH classification of biomedical articles using citation contexts , 2011, J. Biomed. Informatics.

[123]  Yu-Wei Chang,et al.  A comparison of citation contexts between natural sciences and social sciences and humanities , 2013, Scientometrics.

[124]  Jian Xu,et al.  Recognizing Reference Spans and Classifying their Discourse Facets , 2016, BIRNDL@JCDL.

[125]  Jin Xu,et al.  Automatic identification of cited text spans: a multi-classifier approach over imbalanced dataset , 2018, Scientometrics.

[126]  Awais Athar,et al.  Sentiment Analysis of Citations using Sentence Structure-Based Features , 2011, ACL.

[127]  H. D. White Citation Analysis and Discourse Analysis Revisited. , 2004 .

[128]  ChengXiang Zhai,et al.  Generating Impact-Based Summaries for Scientific Literature , 2008, ACL.

[129]  Staša Milojević,et al.  Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content , 2012, J. Assoc. Inf. Sci. Technol..

[130]  Dragomir R. Radev,et al.  Blind men and elephants: What do citation summaries tell us about a research article? , 2008, J. Assoc. Inf. Sci. Technol..

[131]  Enrique Herrera-Viedma,et al.  A New Approach for Implicit Citation Extraction , 2018, IDEAL.

[132]  Toyohide Watanabe,et al.  Analysis of Reference Relationships among Research Papers, Based on citation Context , 2012, Int. J. Artif. Intell. Tools.

[133]  W. Shadish,et al.  Author Judgements about Works They Cite: Three Studies from Psychology Journals , 1995 .

[134]  Rakesh M. Verma,et al.  Combining Syntax and Semantics for Automatic Extractive Single-Document Summarization , 2012, CICLing.

[135]  Bluma C. Peritz,et al.  A classification of citation roles for the social sciences and related fields , 1983, Scientometrics.

[136]  Haruna Chiroma,et al.  Context-Aware Recommender System: A Review of Recent Developmental Process and Future Research Direction , 2017 .

[137]  Henry G. Small,et al.  Discovering discoveries: Identifying biomedical discoveries using citation contexts , 2017, J. Informetrics.

[138]  Marc Bertin,et al.  A Study of Lexical Distribution in Citation Contexts through the IMRaD Standard , 2014, BIR@ECIR.

[139]  Marti A. Hearst,et al.  Citances: Citation Sentences for Semantic Analysis of Bioscience Text , 2004 .

[140]  Simone Teufel,et al.  Context-Enhanced Citation Sentiment Detection , 2012, NAACL.

[141]  Mohsen Kahani,et al.  SemCiR: A citation recommendation system based on a novel semantic distance measure , 2013, Program.

[142]  Stephen Cranefield,et al.  Context identification of sentences in research articles: Towards developing intelligent tools for the research community , 2012, Natural Language Engineering.

[143]  Lei Li,et al.  CIST@CLSciSumm-17: Multiple Features Based Citation Linkage, Classification and Summarization , 2017, BIRNDL@SIGIR.

[144]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[145]  Vincent Larivière,et al.  The linguistic patterns and rhetorical structure of citation context: an approach using n-grams , 2016, Scientometrics.

[146]  Rakesh Chandra Balabantaray,et al.  Document Clustering using K-Means and K-Medoids , 2015, ArXiv.

[147]  Dragomir R. Radev,et al.  Using Citations to Generate surveys of Scientific Paradigms , 2009, NAACL.

[148]  Wenyi Huang,et al.  A Neural Probabilistic Model for Context Based Citation Recommendation , 2015, AAAI.

[149]  Yasunori Yamamoto,et al.  Colil: a database and search service for citation contexts in the life sciences domain , 2015, J. Biomed. Semant..

[150]  C. Lee Giles,et al.  ParsCit: an Open-source CRF Reference String Parsing Package , 2008, LREC.

[151]  B. Cronin Norms and functions in citation: The view of journal editors and referees in psychology , 1982 .

[152]  Dragomir R. Radev,et al.  Coherent Citation-Based Summarization of Scientific Papers , 2011, ACL.

[153]  Ben-Ami Lipetz,et al.  Improvement of the selectivity of citation indexes to science literature through inclusion of citation relationship indicators , 1965 .