Pattern Analysis of Citation-Anchors in Citing Documents for Accurate Identification of In-Text Citations

Citations play an important role in ranking of authors, journals, institutions, and organizations. Sometimes, citing documents cite a reference many times in their full-text, which is further used in many application scenarios, such as: 1) finding relationship between cited and citing papers; 2) identifying influential cited paper from set of references in citing paper; 3) identification of suitable citation functions; and 4) study of in-text citations in different logical sections of papers to conclude different findings. The accurate identification of in-text citations remained an open area of research. Recently, the complexities involving automatic identification of in-text citations have been reported with an accuracy rate of 58%. This is due to many issues as highlighted by the state-of-the-art research. This paper investigates such issues in further details: 1) by taking benefits from the previous research; 2) by analyzing different referencing formats; and 3) by experimenting on a comprehensive data set. Based on the investigation, this paper proposes a taxonomy and workable system, which utilizes a set of heuristics build from detailed study. The proposed model is then applied on unseen diversified data set taken from the Journal of Universal Computer Science and CiteSeer. The proposed model was able to achieve an average F-score of 0.97 as compared with the baseline 0.58.

[1]  E. Garfield Citation analysis as a tool in journal evaluation. , 1972, Science.

[2]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[3]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[4]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[5]  Amanda H. Goodall Should top universities be led by top researchers and are they?: A citations analysis , 2005, J. Documentation.

[6]  Anna Ritchie,et al.  Citation context analysis for information retrieval , 2009 .

[7]  Jöran Beel,et al.  Citation Proximity Analysis (CPA) : A New Approach for Identifying Related Work Based on Co-Citation Analysis , 2009 .

[8]  Simone Teufel,et al.  Robust Argumentative Zoning for Sensemaking in Scholarly Documents , 2009, NLP4DL/AT4DL.

[9]  Jöran Beel,et al.  Google Scholar's ranking algorithm: The impact of citation counts (An empirical study) , 2009, 2009 Third International Conference on Research Challenges in Information Science.

[10]  Wolf-Tilo Balke,et al.  Rule based Autonomous Citation Mining with TIERL , 2010, J. Digit. Inf. Manag..

[11]  Muhammad Abdul Qadir,et al.  Discovering Semantic Relatedness between Scientific Articles through Citation Frequency , 2011 .

[12]  Ming Li,et al.  Counting citations in texts rather than reference lists to improve the accuracy of assessing scientific contribution , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[13]  Chaomei Chen,et al.  The Effects of Co-citation Proximity on Co-citation Analysis , 2011 .

[14]  Angelo Di Iorio,et al.  Semantic Annotation of Scholarly Documents and Citations , 2013, AI*IA.

[15]  Kevin W. Boyack,et al.  Improving the accuracy of co-citation clustering using full text , 2013, J. Assoc. Inf. Sci. Technol..

[16]  Angelo Di Iorio,et al.  Characterising Citations in Scholarly Documents: The CiTalO Framework , 2013, ESWC.

[17]  Muhammad Rafi,et al.  Classification of Research Citations (CRC) , 2015, CLBib@ISSI.

[18]  Muhammad Abdul Qadir,et al.  Lessons Learned: The Complexity of Accurate Identification of in-Text Citations , 2015, Int. Arab J. Inf. Technol..