Node similarity in the citation graph

Published scientific articles are linked together into a graph, the citation graph, through their citations. This paper explores the notion of similarity based on connectivity alone, and proposes several algorithms to quantify it. Our metrics take advantage of the local neighborhoods of the nodes in the citation graph. Two variants of link-based similarity estimation between two nodes are described, one based on the separate local neighborhoods of the nodes, and another based on the joint local neighborhood expanded from both nodes at the same time. The algorithms are implemented and evaluated on a subgraph of the citation graph of computer science in a retrieval context. The results are compared with text-based similarity, and demonstrate the complementarity of link-based and text-based retrieval.

[1]  Christina Wodtke,et al.  Information Architecture , 2002 .

[2]  Evangelos E. Milios,et al.  Node similarity in networked information spaces , 2001, CASCON.

[3]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[4]  Chaomei Chen,et al.  Visualising Semantic Spaces and Author Co-Citation Networks in Digital Libraries , 1999, Inf. Process. Manag..

[5]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[6]  Evangelos E. Milios,et al.  AUTOMATIC TERM EXTRACTION AND DOCUMENT SIMILARITY IN SPECIAL TEXT CORPORA , 2003 .

[7]  Corinna Cortes,et al.  Communities of interest , 2001, Intell. Data Anal..

[8]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[9]  Tobias Zimmermann,et al.  Information architecture , 2004, Electron. Libr..

[10]  Yuan An,et al.  Characterizing and Mining Citation Graph of Computer Science Literature , 2001 .

[11]  Ravi Kumar,et al.  Extracting Large-Scale Knowledge Bases from the Web , 1999, VLDB.

[12]  Henry G. Small,et al.  The synthesis of specialty narratives from co-citation clusters , 1986, J. Am. Soc. Inf. Sci..

[13]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[14]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[15]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[16]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[19]  E. Garfield Citation analysis as a tool in journal evaluation. , 1972, Science.

[20]  Haim Kaplan,et al.  Purely functional, real-time deques with catenation , 1999, JACM.

[21]  E. Garfield,et al.  Citation indexes for science. , 1956, Science.

[22]  E. Garfield Citation indexes for science. A new dimension in documentation through association of ideas. 1955. , 1955, International journal of epidemiology.