Unifying the Global and Local Approaches: An Efficient Power Iteration with Forward Push

Personalized PageRank (PPR) is a critical measure of the importance of a node t to a source node s in a graph. The Single-Source PPR (SSPPR) query computes the PPR's of all the nodes with respect to s on a directed graph G with n nodes and m edges; and it is an essential operation widely used in graph applications. In this paper, we propose novel algorithms for answering two variants of SSPPR queries: (i) high-precision queries and (ii) approximate queries. For high-precision queries, Power Iteration (PowItr) and Forward Push (FwdPush) are two fundamental approaches. Given an absolute error threshold λ (which is typically set to as small as 10-8), the only known bound of FwdPush is O(m/λ), much worse than the O(m log 1/λ)-bound of PowItr. Whether FwdPush can achieve the same running time bound as PowItr does still remains an open question in the research community. We give a positive answer to this question. We show that the running time of a common implementation of FwdPush is actually bounded by O(m · log 1/λ). Based on this finding, we propose a new algorithm, called Power Iteration with Forward Push (PowerPush), which incorporates the strengths of both PowItr and FwdPush. For approximate queries (with a relative error ε), we propose a new algorithm, called SpeedPPR, with overall expected time bounded by $O(n · log n · log 1/ε) on scale-free graphs. This improves the state-of-the-art O((n · log n)/ε) bound. We conduct extensive experiments on six real datasets. The experimental results show that PowerPush outperforms the state-of-the-art high-precision algorithm BePi by up to an order of magnitude in both efficiency and accuracy. Furthermore, our SpeedPPR also outperforms the state-of-the-art approximate algorithm FORA by up to an order of magnitude in all aspects including query time, accuracy, pre-processing time as well as index size.

[1]  Sibo Wang,et al.  Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries , 2019, ACM Trans. Database Syst..

[2]  Zhewei Wei,et al.  Scalable Graph Embeddings via Sparse Transpose Proximities , 2019, ArXiv.

[3]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[4]  Ruoming Jin,et al.  Fast and unified local search for random walk based k-nearest-neighbor query in large graphs , 2014, SIGMOD Conference.

[5]  Eli Upfal,et al.  Fast Distributed PageRank Computation , 2012, ICDCN.

[6]  Ken-ichi Kawarabayashi,et al.  Efficient PageRank Tracking in Evolving Networks , 2015, KDD.

[7]  Ashish Goel,et al.  Personalized PageRank Estimation and Search: A Bidirectional Approach , 2015, WSDM.

[8]  Fan Chung Graham,et al.  Concentration Inequalities and Martingale Inequalities: A Survey , 2006, Internet Math..

[9]  Ashish Goel,et al.  FAST-PPR: scaling personalized pagerank estimation for large graphs , 2014, KDD.

[10]  Vahab S. Mirrokni,et al.  Local Computation of PageRank Contributions , 2007, Internet Math..

[11]  Kevin Chen-Chuan Chang,et al.  Incremental and Accuracy-Aware Personalized PageRank through Scheduled Approximation , 2013, Proc. VLDB Endow..

[12]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[13]  Tao Guo,et al.  Distributed Algorithms on Exact Personalized PageRank , 2017, SIGMOD Conference.

[14]  Lee Sael,et al.  BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart , 2017, SIGMOD Conference.

[15]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[16]  Jun Wang,et al.  Adaptive Structural Fingerprints for Graph Attention Networks , 2020, ICLR.

[17]  Ashish Goel,et al.  Bidirectional PageRank Estimation: From Average-Case to Worst-Case , 2015, WAW.

[18]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[19]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[20]  L. Takac DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .

[21]  Hongyang Zhang,et al.  Approximate Personalized PageRank on Dynamic Graphs , 2016, KDD.

[22]  Yin Yang,et al.  HubPPR: Effective Indexing for Approximate Personalized PageRank , 2016, Proc. VLDB Endow..

[23]  Takuya Akiba,et al.  Computing Personalized PageRank Quickly by Exploiting Graph Structures , 2014, Proc. VLDB Endow..

[24]  Reynold Cheng,et al.  CLUDE: An Efficient Algorithm for LU Decomposition Over a Sequence of Evolving Graphs , 2014, EDBT.

[25]  Soumen Chakrabarti,et al.  Fast algorithms for topk personalized pagerank queries , 2008, WWW.

[26]  Lee Sael,et al.  BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs , 2015, SIGMOD Conference.

[27]  Sibo Wang,et al.  TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs , 2018, SIGMOD Conference.

[28]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[29]  Xuemin Lin,et al.  IRWR: incremental random walk with restart , 2013, SIGIR.

[30]  Yasuhiro Fujiwara,et al.  Efficient personalized pagerank with accuracy assurance , 2012, KDD.

[31]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[32]  Yasuhiro Fujiwara,et al.  Fast and Exact Top-k Algorithm for PageRank , 2013, AAAI.

[33]  Julie A. McCann,et al.  Random Walk with Restart over Dynamic Graphs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[34]  Yin Yang,et al.  FORA: Simple and Effective Approximate Single-Source Personalized PageRank , 2017, KDD.

[35]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[36]  Dong Xin,et al.  Fast personalized PageRank on MapReduce , 2011, SIGMOD '11.

[37]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[38]  Emmanuel Müller,et al.  VERSE: Versatile Graph Embeddings from Similarity Measures , 2018, WWW.

[39]  Raymond Chi-Wing Wong,et al.  Index-Free Approach with Theoretical Guarantee for Efficient Random Walk with Restart Query , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[40]  Yasuhiro Fujiwara,et al.  Efficient ad-hoc search for personalized PageRank , 2013, SIGMOD '13.

[41]  Yasuhiro Fujiwara,et al.  Fast and Exact Top-k Search for Random Walk with Restart , 2012, Proc. VLDB Endow..

[42]  Mustafa Coskun,et al.  Efficient Processing of Network Proximity Queries via Chebyshev Acceleration , 2016, KDD.

[43]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[44]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[45]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).