Improving the accuracy of co-citation clustering using full text

Historically, co‐citation models have been based only on bibliographic information. Full‐text analysis offers the opportunity to significantly improve the quality of the signals upon which these co‐citation models are based. In this work we study the effect of reference proximity on the accuracy of co‐citation clusters. Using a corpus of 270,521 full text documents from 2007, we compare the results of traditional co‐citation clustering using only the bibliographic information to results from co‐citation clustering where proximity between reference pairs is factored into the pairwise relationships. We find that accounting for reference proximity from full text can increase the textual coherence (a measure of accuracy) of a co‐citation cluster solution by up to 30% over the traditional approach based on bibliographic information.

[1]  Alison Callahan,et al.  Contextual cocitation: Augmenting cocitation analysis and its applications , 2010 .

[2]  Kevin W. Boyack,et al.  Identifying a better measure of relatedness for mapping science , 2006 .

[3]  Kevin W. Boyack,et al.  Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? , 2010 .

[4]  Marti A. Hearst,et al.  Citances: Citation Sentences for Semantic Analysis of Bioscience Text , 2004 .

[5]  Alison Callahan,et al.  Contextual cocitation: Augmenting cocitation analysis and its applications , 2010, J. Assoc. Inf. Sci. Technol..

[6]  Henry G. Small,et al.  Interpreting maps of science using citation context sentiments: a preliminary investigation , 2011, Scientometrics.

[7]  Dragomir R. Radev,et al.  Blind men and elephants: What do citation summaries tell us about a research article? , 2008 .

[8]  Jöran Beel,et al.  Citation Proximity Analysis (CPA) : A New Approach for Identifying Related Work Based on Co-Citation Analysis , 2009 .

[9]  E. Garfield,et al.  Can Citation Indexing Be Automated ? , 1964 .

[10]  Lutz Bornmann,et al.  What do citation counts measure? A review of studies on citing behavior , 2008, J. Documentation.

[11]  Chaomei Chen,et al.  The proximity of co-citation , 2011, Scientometrics.

[12]  Kevin W. Boyack,et al.  Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches , 2011, PloS one.

[13]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[14]  Shashank Agarwal,et al.  Automatically classifying the role of citations in biomedical articles. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[15]  Simone Teufel,et al.  The Structure of Scientific Articles - Applications to Citation Indexing and Summarization , 2010, CSLI Studies in Computational Linguistics.

[16]  Stephen E. Robertson,et al.  Using Terms from Citations for IR: Some First Results , 2008, ECIR.

[17]  Henry Voos,et al.  Are All Citations Equal? Or, Did We Op. Cit. Your Idem?. , 1976 .

[18]  Wiley Interscience Journal of the American Society for Information Science and Technology , 2013 .

[19]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[20]  Kevin W. Boyack,et al.  Using global mapping to create more accurate document-level maps of research fields , 2011, J. Assoc. Inf. Sci. Technol..

[21]  Simone Teufel,et al.  Automatic classification of citation function , 2006, EMNLP.

[22]  Richard Klavans,et al.  Identifying Scientific Breakthroughs by Combining Co-citation Analysis and Citation Context , 2013 .

[23]  Kevin W. Boyack,et al.  OpenOrd: an open-source toolbox for large graph layout , 2011, Electronic Imaging.

[24]  Halil Kilicoglu,et al.  Recognizing speculative language in biomedical research articles: a linguistically motivated perspective , 2008, BMC Bioinformatics.

[25]  Katherine W. McCain,et al.  Citation context analysis and aging patterns of journal articles in molecular genetics , 1989, Scientometrics.