Characterizing and Mining Citation Graph of Computer Science Literature

Citation graphs representing a body of scientific literature convey measures of scholarly activity and productivity. In this work we present a study of the structure of the citation graph of the computer science literature. Using a web robot we built several topic-specific citation graphs and their union graph from the digital library ResearchIndex. After verifying that the degree distributions follow a power law, we applied a series of graph theoretical algorithms to elicit an aggregate picture of the citation graph in terms of its connectivity. We discovered the existence of a single large weakly-connected and a single large biconnected component, and confirmed the expected lack of a large strongly-connected component. The large components remained even after removing the strongest authority nodes or the strongest hub nodes, indicating that such tight connectivity is widespread and does not depend on a small subset of important nodes. Finally, minimum cuts between authority papers of different areas did not result in a balanced partitioning of the graph into areas, pointing to the need for more sophisticated algorithms for clustering the graph.

[1]  Ben Shneiderman,et al.  Identifying aggregates in hypertext structures , 1991, HYPERTEXT '91.

[2]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[3]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[4]  Charles Gide,et al.  Cours d'économie politique , 1911 .

[5]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[6]  Steve Lawrence,et al.  ResearchIndex: inside the world's largest free full-text index of scientific literature , 2001, K-CAP '01.

[7]  Ray J. Paul,et al.  Visualizing a Knowledge Domain's Intellectual Structure , 2001, Computer.

[8]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[9]  David S. Johnson,et al.  Some Simplified NP-Complete Graph Problems , 1976, Theor. Comput. Sci..

[10]  Alberto O. Mendelzon,et al.  What is this page known for? Computing Web page reputations , 2000, Comput. Networks.

[11]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[12]  Charles M. Fiduccia,et al.  A linear-time heuristic for improving network partitions , 1988, 25 years of DAC.

[13]  Alf-Christian Ortyl Paul Achilles,et al.  The Collection of Computer Science Bibliographies , 1995 .

[14]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  E. Garfield Citation analysis as a tool in journal evaluation. , 1972, Science.

[17]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[18]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[19]  Prabhakar Raghavan,et al.  Mining the Link Structure of the World Wide Web , 1998 .

[20]  D. Cremers,et al.  Diffusion-snakes: combining statistical shape knowledge and image information in a variational framework , 2001, Proceedings IEEE Workshop on Variational and Level Set Methods in Computer Vision.

[21]  Ravi Kumar,et al.  Self-similarity in the web , 2001, TOIT.

[22]  Chaomei Chen,et al.  Visualising Semantic Spaces and Author Co-Citation Networks in Digital Libraries , 1999, Inf. Process. Manag..

[23]  DeanJeffrey,et al.  Finding related pages in the World Wide Web , 1999 .

[24]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[25]  C. Lee Giles,et al.  CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications , 1998, AGENTS '98.

[26]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[27]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[28]  J. Marchal Cours d'economie politique , 1950 .

[29]  Joseph Naor,et al.  Fast approximate graph partitioning algorithms , 1997, SODA '97.

[30]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[31]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.