Mining citation information from CiteSeer data

The CiteSeer digital library is a useful source of bibliographic information. It allows for retrieving citations, co-authorships, addresses, and affiliations of authors and publications. In spite of this, it has been relatively rarely used for automated citation analyses. This article describes our findings after extensively mining from the CiteSeer data. We explored citations between authors and determined rankings of influential scientists using various evaluation methods including citation and in-degree counts, HITS, PageRank, and its variations based on both the citation and collaboration graphs. We compare the resulting rankings with lists of computer science award winners and find out that award recipients are almost always ranked high. We conclude that CiteSeer is a valuable, yet not fully appreciated, repository of citation data and is appropriate for testing novel bibliometric methods.

[1]  Chaomei Chen Domain visualization for digital libraries , 2000, 2000 IEEE Conference on Information Visualization. An International Conference on Computer Visualization and Graphics.

[2]  Lokman I. Meho,et al.  Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar , 2007 .

[3]  Bart Selman,et al.  Tracking evolving communities in large linked networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Lokman I. Meho,et al.  Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar , 2007, J. Assoc. Inf. Sci. Technol..

[5]  Yannis Manolopoulos,et al.  A citation-based system to assist prize awarding , 2005, SGMD.

[6]  François Rousselot,et al.  PageRank for bibliographic networks , 2008, Scientometrics.

[7]  Judit Bar-Ilan,et al.  An ego-centric citation analysis of the works of Michael O. Rabin based on multiple citation indexes , 2006, Inf. Process. Manag..

[8]  Milos Hauskrecht,et al.  Noisy-OR Component Analysis and its Application to Link Analysis , 2006, J. Mach. Learn. Res..

[9]  Elisabeth Logan,et al.  Citation analysis using scientific publications on the Web as data source: A case study in the XML research area , 2002, Scientometrics.

[10]  Dror G. Feitelson,et al.  Predictive ranking of computer scientists using CiteSeer data , 2004, J. Documentation.

[11]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[12]  Massimo Franceschet,et al.  A comparison of bibliometric indicators for computer science scholars and journals on Web of Science and Google Scholar , 2010, Scientometrics.

[13]  Soumen Chakrabarti,et al.  Learning Parameters in Entity Relationship Graphs from Ranking Preferences , 2006, PKDD.

[14]  Dangzhi Zhao,et al.  Challenges of scholarly publications on the Web to the evaluation of science - A comparison of author visibility on the Web and in print journals , 2005, Inf. Process. Manag..

[15]  Hongyuan Zha,et al.  Discovering Temporal Communities from Social Network Documents , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[16]  Dalibor Fiala,et al.  Exploration and Evaluation of Citation Networks , 2008, ELPUB.

[17]  C. Lee Giles,et al.  Scholarly publishing in the Internet age: a citation analysis of computer science literature , 2001, Inf. Process. Manag..

[18]  David M. Pennock,et al.  Statistical relational learning for document mining , 2003, Third IEEE International Conference on Data Mining.

[19]  Yuan An,et al.  Characterizing and Mining Citation Graph of Computer Science Literature , 2001 .

[20]  C. Lee Giles,et al.  Who gets acknowledged: Measuring scientific contributions through automatic acknowledgment indexing , 2004, Proc. Natl. Acad. Sci. USA.

[21]  Andreas Strotmann,et al.  Can citation analysis of Web publications better detect research fronts? , 2007, J. Assoc. Inf. Sci. Technol..