Important citation identification using sentiment analysis of in-text citations

Abstract Citation represents the relationship between the cited and the citing document and vice versa. Citations are widely used to measure the different aspects of knowledge-based achievements such as institutional ranking, author ranking, the impact factor of the journal, research grants, and peer judgments. A fair evaluation of research required a quantitative and qualitative assessment of citations. To perform the qualitative analysis of citations, researchers tried to classify the citations into binary classes (i.e., important and non-important). To perform this task, researchers used metadata, content, citations count, cue words or phrases, sentiment analysis, keywords, and machine learning approaches for citation classification. However, the state-of-the-art results of binary classification are inadequate for the calculation of different aspects of the researcher and their work. Therefore, this research proposed an in-text citation sentiment analysis-based approach for binary classification which effectively enhanced the results of the state-of-the-art. In this research, different machine learning-based models are evaluated to determine the in-text citations sentiments. These sentiment results are further used for positive-negative, and neutral citation counts. Furthermore, the scores of cosine similarity between paper citation pairs are also calculated and used as a feature. This sentiment and cosine similarity scores are further used as features in binary classification. The classification is performed through SVM, KLR, and Random Forest. The proposed approach is evaluated and compared with two state-of-the-art approaches on the benchmark dataset. The proposed approach can achieve 0.83 f-measure with the improvement of 13.6% for dataset 1 and 0.67 with an improvement of 8% for dataset two with a random forest classification model.

[1]  Angelo Di Iorio,et al.  Semantic Annotation of Scholarly Documents and Citations , 2013, AI*IA.

[2]  Richard C. Anderson,et al.  Publication ratings versus peer ratings of universities , 1978, J. Am. Soc. Inf. Sci..

[3]  Ying Ding,et al.  Applying weighted PageRank to author citation networks , 2011, J. Assoc. Inf. Sci. Technol..

[4]  Shashank Agarwal,et al.  Automatically classifying the role of citations in biomedical articles. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[5]  C. Myers Journal citations and scientific eminence in contemporary psychology. , 1970 .

[6]  Susan Bonzi,et al.  Characteristics of a Literature as Predictors of Relatedness Between Cited and Citing Works , 2007, J. Am. Soc. Inf. Sci..

[7]  Muhammad Asif,et al.  Important citation identification by exploiting content and section-wise in-text citation count , 2020, PloS one.

[8]  M. Miller,et al.  Citations, contexts, and humanistic discourse: Toward automatic extraction and classification , 2014, Lit. Linguistic Comput..

[9]  Dragomir R. Radev,et al.  Purpose and Polarity of Citation: Towards NLP-based Bibliometrics , 2013, NAACL.

[10]  Eugene Garfield,et al.  THE USE OF CITATION DATA IN WRITING THE HISTORY OF SCIENCE , 1964 .

[11]  Jalal Shah,et al.  Sentiment analysis of extremism in social media from textual information , 2020, Telematics Informatics.

[12]  E. Garfield Citation analysis as a tool in journal evaluation. , 1972, Science.

[13]  Peter Erlandson,et al.  A taxonomy of motives to cite , 2014, Social studies of science.

[14]  Tarun Kumar,et al.  Identifying citing sentences in research papers using supervised learning , 2010, 2010 International Conference on Information Retrieval & Knowledge Management (CAMP).

[15]  Muhammad Tanvir Afzal,et al.  Identification of important citations by exploiting research articles’ metadata and cue-terms from content , 2018, Scientometrics.

[16]  Chris D. Paice,et al.  The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases , 1980, SIGIR '80.

[17]  D. Cases,et al.  How can we investigate citation behavior?: a study of reasons for citing literature in communication , 2000 .

[18]  E. Garfield,et al.  Can Citation Indexing Be Automated ? , 1964 .

[19]  Rinze Benedictus,et al.  Fewer numbers, better science , 2016, Nature.

[20]  Hinrich Schütze,et al.  Towards a Generic and Flexible Citation Classifier Based on a Faceted Classification Scheme , 2012, COLING.

[21]  José M. Gómez,et al.  Citation Impact Categorization: For Scientific Literature , 2015, 2015 IEEE 18th International Conference on Computational Science and Engineering.

[22]  Dietrich Rebholz-Schuhmann,et al.  Automatic recognition of conceptualization zones in scientific articles and two life science applications , 2012, Bioinform..

[23]  M. Moravcsik,et al.  Some Results on the Function and Quality of Citations , 1975 .

[24]  Serkan Günal,et al.  The impact of preprocessing on text classification , 2014, Inf. Process. Manag..

[25]  V. Cano,et al.  Citation behavior: Classification, utility, and location , 1989, JASIS.

[26]  Shahzad Nazir,et al.  Exploring the Proportion of Content Represented by the Metadata of Research Articles , 2020, 2020 3rd International Conference on Advancements in Computational Sciences (ICACS).

[27]  I. Spiegel-Rosing Science Studies: Bibliometric and Content Analysis , 1977 .

[28]  Christina Courtright,et al.  Context in information behavior research , 2007 .

[29]  José M. Gómez,et al.  Survey in sentiment, polarity and function analysis of citation , 2014, ArgMining@ACL.

[30]  Shahzad Nazir,et al.  Important Citation Identification by Exploiting the Optimal In-text Citation Frequency , 2020, 2020 International Conference on Engineering and Emerging Technologies (ICEET).

[31]  H. Inhaber,et al.  Quality of Research and the Nobel Prizes , 1976 .

[32]  Mohammad Nazir Ahmad,et al.  Social media for knowledge-sharing: A systematic literature review , 2018, Telematics Informatics.

[33]  Muhammad Abdul Qadir,et al.  Discovering Semantic Relatedness between Scientific Articles through Citation Frequency , 2011 .

[34]  Oren Etzioni,et al.  Identifying Meaningful Citations , 2015, AAAI Workshop: Scholarly Big Data.

[35]  Plergiorgio Strata,et al.  Citation analysis , 1995, Nature.

[36]  Robert E. Mercer,et al.  Towards an Automated Citation Classifier , 2000, Canadian Conference on AI.

[37]  Alexandru T. Balaban Positive and negative aspects of citation indices and journal impact factors , 2012, Scientometrics.

[38]  Ming Li,et al.  Counting citations in texts rather than reference lists to improve the accuracy of assessing scientific contribution , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[39]  Terrence A. Brooks,et al.  Private acts and public objects: An investigation of citer motivations , 1985, J. Am. Soc. Inf. Sci..

[40]  Srijan Kumar,et al.  Structure and Dynamics of Signed Citation Networks , 2016, WWW.

[41]  Daniel Lemire,et al.  Measuring academic influence: Not all citations are equal , 2015, J. Assoc. Inf. Sci. Technol..

[42]  Eugene Garfield,et al.  Is citation analysis a legitimate evaluation tool? , 2005, Scientometrics.

[43]  Ulrich Schäfer,et al.  Ensemble-style Self-training on Citation Classification , 2011, IJCNLP.

[44]  In-Cheol Kim,et al.  Automated classification of author's sentiments in citation using machine learning techniques: A preliminary study , 2015, 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[45]  Henry G. Small,et al.  Interpreting maps of science using citation context sentiments: a preliminary investigation , 2011, Scientometrics.

[46]  Simone Teufel,et al.  Automatic classification of citation function , 2006, EMNLP.

[47]  Awais Athar,et al.  Sentiment Analysis of Citations using Sentence Structure-Based Features , 2011, ACL.