论文信息 - MELOPPR: Software/Hardware Co-design for Memory-efficient Low-latency Personalized PageRank

MELOPPR: Software/Hardware Co-design for Memory-efficient Low-latency Personalized PageRank

Personalized PageRank (PPR) is a graph algorithm that evaluates the importance of the surrounding nodes from a source node. Widely used in social network related applications such as recommender systems, PPR requires real-time responses (latency) for a better user experience. Existing works either focus on algorithmic optimization for improving precision while neglecting hardware implementations or focus on distributed global graph processing on large-scale systems for improving throughput rather than response time. Optimizing low-latency local PPR algorithm with a tight memory budget on edge devices remains unexplored. In this work, we propose a memory-efficient, low-latency PPR solution, namely MeLoPPR, with largely reduced memory requirement and a flexible trade-off between latency and precision. MeLoPPR is composed of stage decomposition and linear decomposition and exploits the node score sparsity: Through stage and linear decomposition, MeLoPPR breaks the computation on a large graph into a set of smaller sub-graphs, that significantly saves the computation memory; Through sparsity exploitation, MeLoPPR selectively chooses the sub-graphs that contribute the most to the precision to reduce the required computation. In addition, through software/hardware co-design, we propose a hardware implementation on a hybrid CPU and FPGA accelerating platform, that further speeds up the sub-graph computation. We evaluate the proposed MeLoPPR on memory-constrained devices including a personal laptop and Xilinx Kintex-7 KC705 FPGA using six real-world graphs. First, MeLoPPR demonstrates significant memory saving by $1. 5 \times \sim 13. 4 \times$ on CPU and $73 \times \sim 8699 \times$ on FPGA. Second, MeLoPPR allows flexible trade-offs between precision and execution time: when the precision is 80%, the speedup on CPU is up to $15\times$ and up to $707\times$ on FPGA; when the precision is around 90%, the speedup is up to $70\times$ on FPGA.

Pan Li | Cong Hao | Yao Chen | Lixiang Li | Zacharie Zirnheld

[1] Sibo Wang,et al. TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs , 2018, SIGMOD Conference.

[2] Yin Yang,et al. FORA: Simple and Effective Approximate Single-Source Personalized PageRank , 2017, KDD.

[3] Sibo Wang,et al. Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries , 2019, ACM Trans. Database Syst..

[4] Zhenguo Li,et al. PowerWalk: Scalable Personalized PageRank via Random Walks with Vertex-Centric Decomposition , 2016, CIKM.

[5] Yu Wang,et al. GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6] Shirish Tatikonda,et al. From "Think Like a Vertex" to "Think Like a Graph" , 2013, Proc. VLDB Endow..

[7] Ashish Goel,et al. FAST-PPR: scaling personalized pagerank estimation for large graphs , 2014, KDD.

[8] Yasuhiro Fujiwara,et al. Efficient personalized pagerank with accuracy assurance , 2012, KDD.

[9] Sang-Goo Lee,et al. A Survey on Personalized PageRank Computation Algorithms , 2019, IEEE Access.

[10] Jennifer Widom,et al. Scaling personalized web search , 2003, WWW '03.

[11] Wilfred Ng,et al. Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs , 2014, Proc. VLDB Endow..

[12] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[13] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.