Fast PageRank approximation by adaptive sampling

PageRank is typically computed from the power of transition matrix in a Markov Chain model. It is therefore computationally expensive, and efficient approximation methods to accelerate the computation are necessary, especially when it comes to large graphs. In this paper, we propose two sampling algorithms for PageRank efficient approximation: Direct sampling and Adaptive sampling. Both methods sample the transition matrix and use the sample in PageRank computation. Direct sampling method samples the transition matrix once and uses the sample directly in PageRank computation, whereas adaptive sampling method samples the transition matrix multiple times with an adaptive sample rate which is adjusted iteratively as the computing procedure proceeds. This adaptive sample rate is designed for a good trade-off between accuracy and efficiency for PageRank approximation. We provide detailed theoretical analysis on the error bounds of both methods. We also compare them with several state-of-the-art PageRank approximation methods, including power extrapolation and inner–outer power iteration algorithm. Experimental results on several real-world datasets show that our methods can achieve significantly higher efficiency while attaining comparable accuracy than state-of-the-art methods.

[1]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[2]  Gene H. Golub,et al.  A Two-Stage Algorithm for Computing PageRank and Multistage Generalizations , 2007, Internet Math..

[3]  Allan Borodin,et al.  Link analysis ranking: algorithms, theory, and experiments , 2005, TOIT.

[4]  Shaozhi Ye,et al.  Distributed PageRank computation based on iterative aggregation-disaggregation methods , 2005, CIKM '05.

[5]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[6]  Gang Wu,et al.  Arnoldi versus GMRES for computing pageRank: A theoretical contribution to google's pageRank problem , 2010, TOIS.

[7]  David F. Gleich,et al.  An Inner-Outer Iteration for Computing PageRank , 2010, SIAM J. Sci. Comput..

[8]  Gene H. Golub,et al.  Computing PageRank using Power Extrapolation , 2003 .

[9]  Amy Greenwald,et al.  More efficient parallel computation of pagerank , 2007, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[10]  Qiang Yang,et al.  Exploiting the hierarchical structure for link analysis , 2005, SIGIR '05.

[11]  Benjamin W. Wah,et al.  Wiley Encyclopedia of Computer Science and Engineering , 2009, Wiley Encyclopedia of Computer Science and Engineering.

[12]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[13]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[14]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[15]  Emmanuel J. Candès,et al.  Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements , 2010, ArXiv.

[16]  Hong Chen,et al.  Parallel SimRank computation on large graphs with iterative aggregation , 2010, KDD.

[17]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[18]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[19]  Man Kam Kwong,et al.  Norm inequalities for the powers of a matrix , 1991 .

[20]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[21]  Steve Osborne,et al.  On Accelerating the PageRank Computation , 2009, Internet Math..

[22]  Frank McSherry,et al.  A uniform approach to accelerated PageRank computation , 2005, WWW '05.

[23]  Avram Sidi,et al.  Methods for Acceleration of Convergence (Extrapolation) of Vector Sequences , 2009, Wiley Encyclopedia of Computer Science and Engineering.

[24]  András A. Benczúr,et al.  On the feasibility of low-rank approximation for personalized PageRank , 2005, WWW '05.

[25]  Petros Drineas,et al.  Fast Monte-Carlo algorithms for approximate matrix multiplication , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[26]  Konstantin Avrachenkov,et al.  PageRank of Scale-Free Growing Networks , 2006, Internet Math..