Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees

Given a time-evolving graph, how can we track similarity between nodes in a fast and accurate way, with theoretical guarantees on the convergence and the error? Random Walk with Restart (RWR) is a popular measure to estimate the similarity between nodes and has been exploited in numerous applications. Many real-world graphs are dynamic with frequent insertion/deletion of edges; thus, tracking RWR scores on dynamic graphs in an efficient way has aroused much interest among data mining researchers. Recently, dynamic RWR models based on the propagation of scores across a given graph have been proposed, and have succeeded in outperforming previous other approaches to compute RWR dynamically. However, those models fail to guarantee exactness and convergence time for updating RWR in a generalized form. In this paper, we propose OSP, a fast and accurate algorithm for computing dynamic RWR with insertion/deletion of nodes/edges in a directed/undirected graph. When the graph is updated, OSP first calculates offset scores around the modified edges, propagates the offset scores across the updated graph, and then merges them with the current RWR scores to get updated RWR scores. We prove the exactness of OSP and introduce OSP-T, a version of OSP which regulates a trade-off between accuracy and computation time by using error tolerance?. Given restart probability c, OSP-T guarantees to return RWR scores with O(∋/c) error in O(log(∋/2)/log(1-c)) iterations. Through extensive experiments, we show that OSP tracks RWR exactly up to 4605x faster than existing static RWR method on dynamic graphs, and OSP-T requires up to 15x less time with 730x lower L1 norm error and 3.3x lower rank error than other state-of-the-art dynamic RWR methods.

[1]  Inderjit S. Dhillon,et al.  Overlapping community detection using seed set expansion , 2013, CIKM.

[2]  Hongyang Zhang,et al.  Approximate Personalized PageRank on Dynamic Graphs , 2016, KDD.

[3]  Jinhong Jung,et al.  A comparative study of matrix factorization and random walk with restart in recommender systems , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[4]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[5]  Minji Yoon,et al.  TPA: Two Phase Approximation for Random Walk with Restart , 2017, ArXiv.

[6]  Steve Chien,et al.  Link Evolution: Analysis and Algorithms , 2004, Internet Math..

[7]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[8]  Ken-ichi Kawarabayashi,et al.  Efficient PageRank Tracking in Evolving Networks , 2015, KDD.

[9]  Lee Sael,et al.  BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart , 2017, SIGMOD Conference.

[10]  Yasuhiro Fujiwara,et al.  Fast and Exact Top-k Search for Random Walk with Restart , 2012, Proc. VLDB Endow..

[11]  Silvio Lattanzi,et al.  A Local Algorithm for Finding Well-Connected Clusters , 2013, ICML.

[12]  Ioannis Antonellis,et al.  Simrank++: query rewriting through link analysis of the clickgraph (poster) , 2007, Proc. VLDB Endow..

[13]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[14]  Michael R. Lyu,et al.  MatchSim: a novel neighbor-based similarity measure with maximum neighborhood matching , 2009, CIKM.

[15]  Lee Sael,et al.  Random Walk with Restart on Large Graphs Using Block Elimination , 2016, ACM Trans. Database Syst..

[16]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[17]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[18]  E Marubini,et al.  Bravais-Pearson and Spearman correlation coefficients: meaning, test of hypothesis and confidence interval. , 2002, The International journal of biological markers.

[19]  Soumen Chakrabarti,et al.  Index design and query processing for graph conductance search , 2011, The VLDB Journal.

[20]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[21]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[22]  Jinhong Jung,et al.  Supervised and extended restart in random walks for ranking and link prediction in networks , 2017, PloS one.

[23]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[24]  Lee Sael,et al.  Personalized Ranking in Signed Networks Using Signed Random Walk with Restart , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[25]  Minji Yoon,et al.  TPA: Fast, Scalable, and Accurate Method for Approximate Random Walk with Restart on Billion Scale Graphs , 2017, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[26]  Yin Yang,et al.  HubPPR: Effective Indexing for Approximate Personalized PageRank , 2016, Proc. VLDB Endow..

[27]  Christos Faloutsos,et al.  Axiomatic Analysis of Co-occurrence Similarity Functions , 2012 .

[28]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[29]  Lee Sael,et al.  BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs , 2015, SIGMOD Conference.