A Fast Two-Stage Algorithm for Computing PageRank

We present a fast two-stage algorithm for computing the PageRank vector [16]. The algorithm exploits the following observation: the homogeneous discrete-time Markov chain associated with PageRank is lumpable, with the lumpable subset of nodes being the dangling nodes [13]. Time to convergence is only a fraction of what’s required for the standard algorithm employed by Google [16]. On data of 451,237 webpages, convergence was achieved in 20% of the time. Our algorithm also replaces a common practice which is in general incorrect. Namely, the practice of ignoring the dangling nodes until the last stages of computation [16] does not necessarily accelerate convergence. In comparison, our algorithm is provable, generally applicable, and achieves the desired speedup. The paper ends with a discussion of possible extensions that generalize the divide-and-conquer theme. We describe two variations that incorporate a multi-stage algorithm. In the first variation, the ordinary PageRank vector is computed. In the second variation, the algorithm computes a generalized version of PageRank where webpages are divided into several classes, each incorporating a different personalization vector. The latter represents a major modeling extension and introduces greater flexibility and a potentially more refined model for web traffic.

[1]  Herbert A. Simon,et al.  Aggregation of Variables in Dynamic Systems , 1961 .

[2]  William J. Stewart,et al.  Iterative aggregation/disaggregation techniques for nearly uncoupled markov chains , 1985, JACM.

[3]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[4]  Carl D. Meyer,et al.  Stochastic Complementation, Uncoupling Markov Chains, and the Theory of Nearly Reducible Systems , 1989, SIAM Rev..

[5]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[6]  W. Stewart,et al.  Quasi Lumpability, Lower-Bounding Coupling Matrices, and Nearly Completely Decomposable Markov Chains , 1997 .

[7]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[8]  Sriram Raghavan,et al.  WebBase: a repository of Web pages , 2000, Comput. Networks.

[9]  Jasmine Novak,et al.  PageRank Computation and the Structure of the Web: Experiments and Algorithms , 2002 .

[10]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[11]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[12]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[13]  Taher H. Haveliwala,et al.  The Second Eigenvalue of the Google Matrix , 2003 .

[14]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[15]  B. Nordstrom FINITE MARKOV CHAINS , 2005 .