Important Citation Identification by Exploiting the Optimal In-text Citation Frequency

Research is always based on previously done work. To acknowledge the worthy work of the predecessors of the field, researchers do citations. Citations are factors that are used for measuring the impact factor of journals, to rank the researchers, to find out latest research topics, for allocating research grants etc. In current epoch the research community has turned their focus towards citations and is of the view that all citations are not equally important. To find out important citations, researchers used different approaches such as context based, cue word based, metadata based, frequency based, textual based etc. Among proposed methodologies, frequency based approach was extensively used. The citation with high frequency was considered as important, but it is yet unclear that what should be the frequency cut off value of citation for considering it important. This research explored the significance of applying Threshold value over Frequency count for binary classification. We identified optimal threshold value of frequency count and further applied this to classify the citations into important and non-important ones. To evaluate the proposed approach a benchmark data set annotated by two domain experts was used that consisted of 465 citation pairs. The results were compared with state of the art precision value of 0.72. While the experiment showed increased value of 0.75 in terms of precision

[1]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[2]  Lutz Bornmann,et al.  What do citation counts measure? A review of studies on citing behavior , 2008, J. Documentation.

[3]  Muhammad Tanvir Afzal,et al.  Identification of important citations by exploiting research articles’ metadata and cue-terms from content , 2018, Scientometrics.

[4]  M. Moravcsik,et al.  Some Results on the Function and Quality of Citations , 1975 .

[5]  Gerard Pasterkamp,et al.  Citation frequency: A biased measure of research impact significantly influenced by the geographical origin of research articles , 2007, Scientometrics.

[6]  Susan E. Cozzens,et al.  What do citations count? the rhetoric-first model , 1989, Scientometrics.

[7]  Simone Teufel,et al.  Automatic classification of citation function , 2006, EMNLP.

[8]  H. Inhaber,et al.  Quality of Research and the Nobel Prizes , 1976 .

[9]  Daniel Lemire,et al.  Measuring academic influence: Not all citations are equal , 2015, J. Assoc. Inf. Sci. Technol..

[10]  Hinrich Schütze,et al.  Towards a Generic and Flexible Citation Classifier Based on a Faceted Classification Scheme , 2012, COLING.

[11]  Henry H. Bi,et al.  Comprehensive Citation Index for Research Networks , 2011, IEEE Transactions on Knowledge and Data Engineering.

[12]  J. Ziman,et al.  Public knowledge. An essay concerning the social dimension of science , 1970, Medical History.

[13]  Oren Etzioni,et al.  Identifying Meaningful Citations , 2015, AAAI Workshop: Scholarly Big Data.

[14]  Robert E. Mercer,et al.  Towards an Automated Citation Classifier , 2000, Canadian Conference on AI.

[15]  Dragomir R. Radev,et al.  The ACL anthology network corpus , 2009, Language Resources and Evaluation.

[16]  Eugene Garfield,et al.  Is citation analysis a legitimate evaluation tool? , 2005, Scientometrics.

[17]  D. Helbing,et al.  Global Multi-Level Analysis of the ‘Scientific Food Web' , 2013, Scientific Reports.