LR-PPR: locality-sensitive, re-use promoting, approximate personalized pagerank computation

Personalized PageRank (PPR) based measures of node proximity have been shown to be highly effective in many prediction and recommendation applications. The use of personalized PageRank for large graphs, however, is difficult due to its high computation cost. In this paper, we propose a Locality-sensitive, Re-use promoting, approximate personalized PageRank (LR-PPR) algorithm for efficiently computing the PPR values relying on the localities of the given seed nodes on the graph: (a) The LR-PPR algorithm is locality sensitive in the sense that it reduces the computational cost of the PPR computation process by focusing on the local neighborhoods of the seed nodes. (b) LR-PPR is re-use promoting in that instead of performing a monolithic computation for the given seed node set using the entire graph, LR-PPR divides the work into localities of the seeds and caches the intermediary results obtained during the computation. These cached results are then reused for future queries sharing seed nodes. Experiment results for different data sets and under different scenarios show that LR-PPR algorithm is highly-efficient and accurate.

[1]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[2]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[3]  Dong Xin,et al.  Fast personalized PageRank on MapReduce , 2011, SIGMOD '11.

[4]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[5]  George Casella,et al.  Erratum: Inverting a Sum of Matrices , 1990, SIAM Rev..

[6]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[7]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[8]  Louiqa Raschid,et al.  ApproxRank: Estimating Rank for a Subgraph , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[9]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Soumen Chakrabarti,et al.  Fast algorithms for topk personalized pagerank queries , 2008, WWW.

[11]  Yin Zhang,et al.  Scalable proximity estimation and link prediction in online social networks , 2009, IMC '09.

[12]  Christos Faloutsos,et al.  ANF: a fast and scalable tool for data mining in massive graphs , 2002, KDD.

[13]  Christos Faloutsos,et al.  Fast direction-aware proximity for graph mining , 2007, KDD '07.

[14]  Yasuhiro Fujiwara,et al.  Fast and Exact Top-k Search for Random Walk with Restart , 2012, Proc. VLDB Endow..

[15]  K. Selçuk Candan,et al.  Reasoning for Web document associations and its applications in site map construction , 2002, Data Knowl. Eng..

[16]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[17]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[18]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[19]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[20]  K. Selçuk Candan,et al.  Using Random Walks for Mining Web Document Associations , 2000, PAKDD.

[21]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[22]  K. Avrachenkov,et al.  Quick Detection of Top-k Personalized PageRank Lists , 2011, WAW.

[23]  Purnamrita Sarkar,et al.  Fast incremental proximity search in large graphs , 2008, ICML '08.