Extrapolation methods for accelerating PageRank computations

We present a novel algorithm for the fast computation of PageRank, a hyperlink-based estimate of the ''importance'' of Web pages. The original PageRank algorithm uses the Power Method to compute successive iterates that converge to the principal eigenvector of the Markov matrix representing the Web link graph. The algorithm presented here, called Quadratic Extrapolation, accelerates the convergence of the Power Method by periodically subtracting off estimates of the nonprincipal eigenvectors from the current iterate of the Power Method. In Quadratic Extrapolation, we take advantage of the fact that the first eigenvalue of a Markov matrix is known to be 1 to compute the nonprincipal eigenvectors using successive iterates of the Power Method. Empirically, we show that using Quadratic Extrapolation speeds up PageRank computation by 25-300% on a Web graph of 80 million nodes, with minimal overhead. Our contribution is useful to the PageRank community and the numerical linear algebra community in general, as it is a fast method for determining the dominant eigenvector of a matrix that is too large for standard fast methods to be practical.

[1]  G. Grimmett,et al.  Probability and random processes , 2002 .

[2]  Taher H. Haveliwala Efficient Computation of PageRank , 1999 .

[3]  Jasmine Novak,et al.  PageRank Computation and the Structure of the Web: Experiments and Algorithms , 2002 .

[4]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[5]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[6]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[7]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[8]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[9]  Taher H. Haveliwala,et al.  The Second Eigenvalue of the Google Matrix , 2003 .

[10]  A. C. Aitken XXV.—On Bernoulli's Numerical Solution of Algebraic Equations , 1927 .

[11]  Alberto O. Mendelzon,et al.  What is this page known for? Computing Web page reputations , 2000, Comput. Networks.

[12]  Gene H. Golub,et al.  Matrix computations , 1983 .

[13]  P. Wynn,et al.  On the Convergence and Stability of the Epsilon Algorithm , 1966 .

[14]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[15]  Sriram Raghavan,et al.  WebBase: a repository of Web pages , 2000, Comput. Networks.

[16]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[17]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[18]  U. Krieger Numerical Solution of Large Finite Markov Chains by Algebraic Multigrid Techniques , 1995 .

[19]  David M. Pennock,et al.  The structure of broad topics on the web , 2002, WWW.

[20]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.