CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs

SimRank is a significant metric to measure the similarity of nodes in graph data analysis. The problem of SimRank computation has been studied extensively, however there is no existing work that can provide one unified algorithm to support the SimRank computation both on static and temporal graphs. In this work, we first propose CrashSim, an index-free algorithm for single-source SimRank computation in static graphs. CrashSim can provide provable approximation guarantees for the computational results in an efficient way. In addition, as the reallife graphs are often represented as temporal graphs, CrashSim enables efficient computation of SimRank in temporal graphs. We formally define two typical SimRank queries in temporal graphs, and then solve them by developing an efficient algorithm based on CrashSim, called CrashSim-T. From the extensive experimental evaluation using five real-life and synthetic datasets, it can be seen that the CrashSim algorithm and CrashSim-T algorithm substantially improve the efficiency of the state-of-the-art SimRank algorithms by about 30%, while achieving the precision of the result set with about 97%.

[1]  Yu Liu,et al.  PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs , 2019, SIGMOD Conference.

[2]  Yizhou Sun,et al.  Fast computation of SimRank for static and dynamic information networks , 2010, EDBT '10.

[3]  Julie A. McCann,et al.  High Quality Graph-Based Similarity Search , 2015, SIGIR.

[4]  Jian Pei,et al.  More is Simpler: Effectively and Efficiently Assessing Node-Pair Similarities Based on Hyperlinks , 2013, Proc. VLDB Endow..

[5]  Ken-ichi Kawarabayashi,et al.  Scalable similarity search for SimRank , 2014, SIGMOD Conference.

[6]  Chang Zhou,et al.  UniWalk: Unidirectional Random Walk Based Scalable SimRank Computation over Large Graph , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[7]  Pavel Velikhov,et al.  Accuracy estimate and optimization techniques for SimRank computation , 2008, The VLDB Journal.

[8]  Raymond Chi-Wing Wong,et al.  READS: A Random Walk Approach for Efficient and Accurate Dynamic SimRank , 2017, Proc. VLDB Endow..

[9]  Yi Lu,et al.  Path Problems in Temporal Graphs , 2014, Proc. VLDB Endow..

[10]  V. Mirrokni,et al.  A recommender system based on local random walks and spectral methods , 2007, WebKDD/SNA-KDD '07.

[11]  Hong Chen,et al.  Parallel SimRank computation on large graphs with iterative aggregation , 2010, KDD.

[12]  Xuemin Lin,et al.  Fast incremental SimRank on link-evolving graphs , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[13]  ChenLei,et al.  An efficient similarity search framework for SimRank over large dynamic graphs , 2015, VLDB 2015.

[14]  Yasuhiro Fujiwara,et al.  Efficient search algorithm for SimRank , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[15]  Fan Chung Graham,et al.  Concentration Inequalities and Martingale Inequalities: A Survey , 2006, Internet Math..

[16]  Charu C. Aggarwal,et al.  Evolutionary Network Analysis , 2014, ACM Comput. Surv..

[17]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[18]  Ruoming Jin,et al.  Axiomatic ranking of network role similarity , 2011, KDD.

[19]  Evaggelia Pitoura,et al.  Top-k Durable Graph Pattern Queries on Temporal Graphs , 2019, IEEE Trans. Knowl. Data Eng..

[20]  Xiaokui Xiao,et al.  SLING: A Near-Optimal Index Structure for SimRank , 2016, SIGMOD Conference.

[21]  Yu Liu,et al.  ProbeSim: Scalable Single-Source and Top-k SimRank Computations on Dynamic Graphs , 2017, Proc. VLDB Endow..

[22]  Philip S. Yu,et al.  LinkClus: efficient clustering via heterogeneous semantic links , 2006, VLDB.

[23]  Yi Yang,et al.  Diversified Temporal Subgraph Pattern Mining , 2016, KDD.

[24]  Yue Wang,et al.  Efficient SimRank Tracking in Dynamic Graphs , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[25]  Dániel Fogaras,et al.  Scaling link-based similarity search , 2005, WWW '05.

[26]  Panos Kalnis,et al.  Incremental Frequent Subgraph Mining on Large Evolving Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[27]  Laks V. S. Lakshmanan,et al.  On Top-k Structural Similarity Search , 2012, 2012 IEEE 28th International Conference on Data Engineering.