Fast Random Walk with Restart and Its Applications

How closely related are two nodes in a graph? How to compute this score quickly, on huge, disk-resident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the "connection subgraphs", personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic pre-computation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) block- wise, community-like structure. We exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman- Morrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in pre-computation and storage cost, and they achieve up to 150x speed up with 90%+ quality preservation.

[1]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[2]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[3]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[4]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[5]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[7]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[8]  Heikki Mannila,et al.  Relational link-based ranking , 2004, VLDB.

[9]  Vipin Kumar,et al.  Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs , 1999, SIAM Rev..

[10]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[11]  Gene H. Golub,et al.  Matrix computations , 1983 .

[12]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[13]  Frank McSherry,et al.  Fast computation of low rank matrix. , 2001, STOC 2001.

[14]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[15]  S. Sudarshan,et al.  BANKS: Browsing and Keyword Searching in Relational Databases , 2002, VLDB.

[16]  George Casella,et al.  Erratum: Inverting a Sum of Matrices , 1990, SIAM Rev..

[17]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[18]  Frank M Sherryy Fast Computation of Low Rank Matrix ApproximationsDimitris , 2001 .

[19]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[20]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank , 2004, WAW.

[21]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[22]  Jingrui He,et al.  Manifold-ranking based image retrieval , 2004, MULTIMEDIA '04.

[23]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[24]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[25]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[26]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).