PowerWalk: Scalable Personalized PageRank via Random Walks with Vertex-Centric Decomposition

Most methods for Personalized PageRank (PPR) precompute and store all accurate PPR vectors, and at query time, return the ones of interest directly. However, the storage and computation of all accurate PPR vectors can be prohibitive for large graphs, especially in caching them in memory for real-time online querying. In this paper, we propose a distributed framework that strikes a better balance between offline indexing and online querying. The offline indexing attains a fingerprint of the PPR vector of each vertex by performing billions of ``short'' random walks in parallel across a cluster of machines. We prove that our indexing method has an exponential convergence, achieving the same precision with previous methods using a much smaller number of random walks. At query time, the new PPR vector is composed by a linear combination of related fingerprints, in a highly efficient vertex-centric decomposition manner. Interestingly, the resulting PPR vector is much more accurate than its offline counterpart because it actually uses more random walks in its estimation. More importantly, we show that such decomposition for a batch of queries can be very efficiently processed using a shared decomposition. Our implementation, PowerWalk, takes advantage of advanced distributed graph engines and it outperforms the state-of-the-art algorithms by orders of magnitude. Particularly, it responses to tens of thousands of queries on graphs with billions of edges in just a few seconds.

[1]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[2]  C S LuiJohn,et al.  Walking in the cloud , 2015, VLDB 2015.

[3]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[4]  Taher H. Haveliwala,et al.  The Second Eigenvalue of the Google Matrix , 2003 .

[5]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[6]  Zhenguo Li,et al.  VENUS: Vertex-centric streamlined graph computation on a single PC , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[7]  Konstantin Avrachenkov,et al.  Monte Carlo Methods in PageRank Computation: When One Iteration is Sufficient , 2007, SIAM J. Numer. Anal..

[8]  Shih-Fu Chang,et al.  Learning with Partially Absorbing Random Walks , 2012, NIPS.

[9]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[10]  András A. Benczúr,et al.  To randomize or not to randomize: space optimal summaries for hyperlink analysis , 2006, WWW '06.

[11]  Keshav Pingali,et al.  The tao of parallelism in algorithms , 2011, PLDI '11.

[12]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[13]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[14]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[15]  Lee Sael,et al.  BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs , 2015, SIGMOD Conference.

[16]  Prabhakar Raghavan,et al.  Efficiency-Quality Tradeoffs for Vector Score Aggregation , 2004, VLDB.

[17]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[18]  Zhenguo Li,et al.  VENUS: A System for Streamlined Graph Computation on a Single PC , 2016, IEEE Transactions on Knowledge and Data Engineering.

[19]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[20]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[21]  Rizal Setya Perdana What is Twitter , 2013 .

[22]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[23]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[24]  Reynold Cheng,et al.  Walking in the Cloud: Parallel SimRank at Scale , 2015, Proc. VLDB Endow..

[25]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[26]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[27]  Qiang Yang,et al.  2015 Ieee International Conference on Big Data (big Data) Mining Target Users for Online Marketing Based on App Store Data , 2022 .

[28]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[29]  Pavel Berkhin,et al.  Bookmark-Coloring Algorithm for Personalized PageRank Computing , 2006, Internet Math..

[30]  Yasuhiro Fujiwara,et al.  Efficient personalized pagerank with accuracy assurance , 2012, KDD.

[31]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[32]  Ashish Goel,et al.  FAST-PPR: scaling personalized pagerank estimation for large graphs , 2014, KDD.

[33]  Aapo Kyrola,et al.  DrunkardMob: billions of random walks on just a PC , 2013, RecSys.

[34]  Ken-ichi Kawarabayashi,et al.  Scalable similarity search for SimRank , 2014, SIGMOD Conference.

[35]  Takuya Akiba,et al.  Computing Personalized PageRank Quickly by Exploiting Graph Structures , 2014, Proc. VLDB Endow..

[36]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[37]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[38]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[39]  Dong Xin,et al.  Fast personalized PageRank on MapReduce , 2011, SIGMOD '11.

[40]  Yasuhiro Fujiwara,et al.  Fast and Exact Top-k Search for Random Walk with Restart , 2012, Proc. VLDB Endow..

[41]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.

[42]  Torsten Suel,et al.  Batch query processing for web search engines , 2011, WSDM '11.

[43]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.