TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs

Personalized PageRank (PPR) is a classic metric that measures the relevance of graph nodes with respect to a source node. Given a graph G, a source node s, and a parameter k, a top-k PPR query returns a set of k nodes with the highest PPR values with respect to s. This type of queries serves as an important building block for numerous applications in web search and social networks, such as Twitter's Who-To-Follow recommendation service. Existing techniques for top-k PPR, however, suffer from two major deficiencies. First, they either incur prohibitive space and time overheads on large graphs, or fail to provide any guarantee on the precision of top-k results (i.e., the results returned might miss a number of actual top-k answers). Second, most of them require significant pre-computation on the input graph G, which renders them unsuitable for graphs with frequent updates (e.g., Twitter's social graph). To address the deficiencies of existing solutions, we propose PPR, an algorithm for top-k PPR queries that ensure at least ρ precision (i.e., at least ρ fraction of the actual top-k results are returned) with at least 1 - 1/n probability, where ρ ∈;n (0, 1] is a user-specified parameter and n is the number of nodes in G. In addition, PPR offers non-trivial guarantees on query time in terms of ρ, and it can easily handle dynamic graphs as it does not require any preprocessing. We experimentally evaluate PPR using a variety of benchmark datasets, and demonstrate that PPR outperforms the state-of-the-art solutions in terms of both efficiency and precision, even when we set ρ = 1 (i.e., when PPR returns the exact top-k results). Notably, on a billion-edge Twitter graph, PPR only requires 15 seconds to answer a top-500 PPR query with ρ = 1.

[1]  Vahab S. Mirrokni,et al.  Local Computation of PageRank Contributions , 2007, Internet Math..

[2]  Lee Sael,et al.  BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs , 2015, SIGMOD Conference.

[3]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[4]  Ashish Goel,et al.  Bidirectional PageRank Estimation: From Average-Case to Worst-Case , 2015, WAW.

[5]  Ashish Goel,et al.  FAST-PPR: scaling personalized pagerank estimation for large graphs , 2014, KDD.

[6]  Soumen Chakrabarti,et al.  Fast algorithms for topk personalized pagerank queries , 2008, WWW.

[7]  Fan Chung Graham,et al.  Concentration Inequalities and Martingale Inequalities: A Survey , 2006, Internet Math..

[8]  Csaba Szepesvári,et al.  Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.

[9]  Stephanie Rogers,et al.  Related Pins at Pinterest: The Evolution of a Real-World Recommender System , 2017, WWW.

[10]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[11]  Hongyang Zhang,et al.  Approximate Personalized PageRank on Dynamic Graphs , 2016, KDD.

[12]  Yin Yang,et al.  FORA: Simple and Effective Approximate Single-Source Personalized PageRank , 2017, KDD.

[13]  Xuemin Lin,et al.  IRWR: incremental random walk with restart , 2013, SIGIR.

[14]  Ruoming Jin,et al.  Fast and unified local search for random walk based k-nearest-neighbor query in large graphs , 2014, SIGMOD Conference.

[15]  Yasuhiro Fujiwara,et al.  Efficient personalized pagerank with accuracy assurance , 2012, KDD.

[16]  Yasuhiro Fujiwara,et al.  Efficient ad-hoc search for personalized PageRank , 2013, SIGMOD '13.

[17]  A. J. Walker New fast method for generating discrete random numbers with arbitrary frequency distributions , 1974 .

[18]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[19]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR Forum.

[20]  Yasuhiro Fujiwara,et al.  Fast and Exact Top-k Search for Random Walk with Restart , 2012, Proc. VLDB Endow..

[21]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[22]  Takuya Akiba,et al.  Computing Personalized PageRank Quickly by Exploiting Graph Structures , 2014, Proc. VLDB Endow..

[23]  Mustafa Coskun,et al.  Efficient Processing of Network Proximity Queries via Chebyshev Acceleration , 2016, KDD.

[24]  Eli Upfal,et al.  Fast Distributed PageRank Computation , 2012, ICDCN.

[25]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[26]  Ashish Goel,et al.  Personalized PageRank Estimation and Search: A Bidirectional Approach , 2015, WSDM.

[27]  Tao Guo,et al.  Distributed Algorithms on Exact Personalized PageRank , 2017, SIGMOD Conference.

[28]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[29]  Dong Xin,et al.  Fast personalized PageRank on MapReduce , 2011, SIGMOD '11.

[30]  Yin Yang,et al.  HubPPR: Effective Indexing for Approximate Personalized PageRank , 2016, Proc. VLDB Endow..

[31]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[32]  Reynold Cheng,et al.  CLUDE: An Efficient Algorithm for LU Decomposition Over a Sequence of Evolving Graphs , 2014, EDBT.

[33]  Julie A. McCann,et al.  Random Walk with Restart over Dynamic Graphs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[34]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[35]  Behrouz Minaei-Bidgoli,et al.  ROLL: Fast In-Memory Generation of Gigantic Scale-free Networks , 2016, SIGMOD Conference.

[36]  Kevin Chen-Chuan Chang,et al.  Incremental and Accuracy-Aware Personalized PageRank through Scheduled Approximation , 2013, Proc. VLDB Endow..

[37]  Yasuhiro Fujiwara,et al.  Fast and Exact Top-k Algorithm for PageRank , 2013, AAAI.

[38]  Ken-ichi Kawarabayashi,et al.  Efficient PageRank Tracking in Evolving Networks , 2015, KDD.

[39]  Lee Sael,et al.  BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart , 2017, SIGMOD Conference.