论文信息 - Parallelizing approximate single-source personalized PageRank queries on shared memory

Parallelizing approximate single-source personalized PageRank queries on shared memory

Given a directed graph G, a source node s, and a target node t, the personalized PageRank (PPR) π(s,t)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi (s,t)$$\end{document} measures the importance of node t with respect to node s. In this work, we study the single-source PPR query, which takes a source node s as input and outputs the PPR values of all nodes in G with respect to s. The single-source PPR query finds many important applications, e.g., community detection and recommendation. Deriving the exact answers for single-source PPR queries is prohibitive, so most existing work focuses on approximate solutions. Nevertheless, existing approximate solutions are still inefficient, and it is challenging to compute single-source PPR queries efficiently for online applications. This motivates us to devise efficient parallel algorithms running on shared-memory multi-core systems. In this work, we present how to efficiently parallelize the state-of-the-art index-based solution FORA, and theoretically analyze the complexity of the parallel algorithms. Theoretically, we prove that our proposed algorithm achieves a time complexity of O(W/P+log2n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(W/P+\log ^2{n})$$\end{document}, where W is the time complexity of sequential FORA algorithm, P is the number of processors used, and n is the number of nodes in the graph. FORA includes a forward push phase and a random walk phase, and we present optimization techniques to both phases, including effective maintenance of active nodes, improving the efficiency of memory access, and cache-aware scheduling. Extensive experimental evaluation demonstrates that our solution achieves up to 37×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} speedup on 40 cores and 3.3×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} faster than alternatives on 40 cores. Moreover, the forward push alone can be used for local graph clustering, and our parallel algorithm for forward push is 4.8×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} faster than existing parallel alternatives.

[1] Yin Yang,et al. HubPPR: Effective Indexing for Approximate Personalized PageRank , 2016, Proc. VLDB Endow..

[2] Inderjit S. Dhillon,et al. Overlapping Community Detection Using Neighborhood-Inflated Seed Expansion , 2015, IEEE Transactions on Knowledge and Data Engineering.

[3] Vahab S. Mirrokni,et al. Local Computation of PageRank Contributions , 2007, Internet Math..

[4] Jennifer Widom,et al. Scaling personalized web search , 2003, WWW '03.

[5] Lee Sael,et al. BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs , 2015, SIGMOD Conference.

[6] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .

[7] David A. Patterson,et al. Direction-optimizing breadth-first search , 2012, HiPC 2012.

[8] Richard P. Brent,et al. The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[9] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[10] Kevin Chen-Chuan Chang,et al. Incremental and Accuracy-Aware Personalized PageRank through Scheduled Approximation , 2013, Proc. VLDB Endow..

[11] Christos Faloutsos,et al. R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[12] Mustafa Coskun,et al. Efficient Processing of Network Proximity Queries via Chebyshev Acceleration , 2016, KDD.

[13] Yin Yang,et al. FORA: Simple and Effective Approximate Single-Source Personalized PageRank , 2017, KDD.

[14] Ronald L. Rivest,et al. Introduction to Algorithms, third edition , 2009 .

[15] Yasuhiro Fujiwara,et al. Efficient personalized pagerank with accuracy assurance , 2012, KDD.

[16] Soumen Chakrabarti,et al. Fast algorithms for topk personalized pagerank queries , 2008, WWW.

[17] Phuong Nguyen,et al. An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data , 2015, WWW.

[18] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .

[19] Edith Cohen,et al. Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[20] Lee Sael,et al. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart , 2017, SIGMOD Conference.

[21] Charles E. Leiserson,et al. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers) , 2010, SPAA '10.

[22] Yasuhiro Fujiwara,et al. Efficient ad-hoc search for personalized PageRank , 2013, SIGMOD '13.

[23] Jure Leskovec,et al. Local Higher-Order Graph Clustering , 2017, KDD.

[24] Ashish Goel,et al. Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[25] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[26] Hongyang Zhang,et al. Approximate Personalized PageRank on Dynamic Graphs , 2016, KDD.

[27] Jinhong Jung,et al. A comparative study of matrix factorization and random walk with restart in recommender systems , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[28] Kian-Lee Tan,et al. Parallel Personalized Pagerank on Dynamic Graphs , 2017, Proc. VLDB Endow..

[29] Sibo Wang,et al. Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks , 2018, SIGMOD Conference.

[30] Tao Guo,et al. Distributed Algorithms on Exact Personalized PageRank , 2017, SIGMOD Conference.

[31] Fan Chung Graham,et al. Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[32] Dong Xin,et al. Fast personalized PageRank on MapReduce , 2011, SIGMOD '11.

[33] Jimmy J. Lin,et al. WTF: the who to follow service at Twitter , 2013, WWW.

[34] Wenqing Lin,et al. Distributed Algorithms for Fully Personalized PageRank on Large Graphs , 2019, WWW.

[35] Xuemin Lin,et al. Speedup Graph Processing by Graph Ordering , 2016, SIGMOD Conference.

[36] Dániel Fogaras,et al. Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[37] Ashish Goel,et al. Personalized PageRank Estimation and Search: A Bidirectional Approach , 2015, WSDM.

[38] Yasuhiro Fujiwara,et al. Fast and Exact Top-k Search for Random Walk with Restart , 2012, Proc. VLDB Endow..

[39] Sibo Wang,et al. TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs , 2018, SIGMOD Conference.

[40] Guy E. Blelloch,et al. Phase-concurrent hash tables for determinism , 2014, SPAA.

[41] Stephanie Rogers,et al. Related Pins at Pinterest: The Evolution of a Real-World Recommender System , 2017, WWW.