A Novel and Fast SimRank Algorithm

SimRank is a widely adopted similarity measure for objects modeled as nodes in a graph, based on the intuition that two objects are similar if they are referenced by similar objects. The recursive nature of SimRank definition makes it expensive to compute the similarity score even for a single pair of nodes. This defect limits the applications of SimRank. To speed up the computation, some existing works replace the original model with an approximate model to seek only rough solution of SimRank scores. In this work, we propose a novel solution for computing all-pair SimRank scores. In particular, we propose to convert SimRank to the problem of solving a linear system in matrix form, and further prove that the system is non-singular, diagonally dominate, and symmetric definite positive (for undirected graphs). Those features immediately lead to the adoption of Conjugate Gradient (CG) and Bi-Conjugate Gradient (BiCG) techniques for efficiently computing SimRank scores. As a result, a significant improvement on the convergence rate can be achieved; meanwhile, the sparsity of the adjacency matrix is not damaged all the time. Inspired by the existing common neighbor sharing strategy, we further reduce the computational complexity of the matrix multiplication and resolve the scalable issues. The experimental results show our proposed algorithms significantly outperform the state-of-the-art algorithms.

[1]  Ken-ichi Kawarabayashi,et al.  Scalable SimRank join algorithm , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[2]  Xuemin Lin,et al.  Towards efficient SimRank computation on large networks , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[3]  Yasuhiro Fujiwara,et al.  Efficient search algorithm for SimRank , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[4]  Richard Bellman,et al.  Introduction to Matrix Analysis , 1972 .

[5]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[6]  George T. Gilber Positive definite matrices and Sylvester's criterion , 1991 .

[7]  V. Mirrokni,et al.  A recommender system based on local random walks and spectral methods , 2007, WebKDD/SNA-KDD '07.

[8]  Hong Chen,et al.  Parallel SimRank computation on large graphs with iterative aggregation , 2010, KDD.

[9]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[10]  Xuemin Lin,et al.  A space and time efficient algorithm for SimRank computation , 2010, 2010 12th International Asia-Pacific Web Conference.

[11]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[12]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[13]  Jian Pei,et al.  More is Simpler: Effectively and Efficiently Assessing Node-Pair Similarities Based on Hyperlinks , 2013, Proc. VLDB Endow..

[14]  Julie A. McCann,et al.  High Quality Graph-Based Similarity Search , 2015, SIGIR.

[15]  D. Gottlieb,et al.  Numerical analysis of spectral methods : theory and applications , 1977 .

[16]  Yizhou Sun,et al.  Fast computation of SimRank for static and dynamic information networks , 2010, EDBT '10.

[17]  Yizhou Sun,et al.  P-Rank: a comprehensive structural similarity measure over information networks , 2009, CIKM.

[18]  G. Stewart,et al.  Gershgorin Theory for the Generalized Eigenvalue Problem Ax — \ Bx , 2010 .

[19]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[20]  Julie A. McCann,et al.  Efficient Partial-Pairs SimRank Search for Large Networks , 2015, Proc. VLDB Endow..

[21]  Philip S. Yu,et al.  LinkClus: efficient clustering via heterogeneous semantic links , 2006, VLDB.

[22]  Henk A. van der Vorst,et al.  Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..

[23]  O. Axelsson Solution of linear systems of equations: Iterative methods , 1977 .

[24]  D. Gottlieb,et al.  Numerical analysis of spectral methods , 1977 .

[25]  Ken-ichi Kawarabayashi,et al.  Scalable similarity search for SimRank , 2014, SIGMOD Conference.

[26]  Pavel Velikhov,et al.  Accuracy estimate and optimization techniques for SimRank computation , 2008, The VLDB Journal.

[27]  M. Benzi Preconditioning techniques for large linear systems: a survey , 2002 .

[28]  Ioannis Antonellis,et al.  Simrank++: query rewriting through link analysis of the clickgraph (poster) , 2007, Proc. VLDB Endow..

[29]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[30]  Dániel Fogaras,et al.  Scaling link-based similarity search , 2005, WWW '05.

[31]  Edward A. Fox,et al.  SimFusion: measuring similarity using unified relationship matrix , 2005, SIGIR '05.

[32]  Reynold Cheng,et al.  Walking in the Cloud: Parallel SimRank at Scale , 2015, Proc. VLDB Endow..