When rank trumps precision: using the power method to compute google's pagerank

The PageRank algorithm, developed by Google founders Larry Page and Sergey Brin, assigns ranking scores to webpages that reflect their relative importance. These scores are based primarily on the link structure of the Web graph and correspond to elements of a dominant left eigenvector, called the PageRank vector, of the stochastic Google matrix. When the starting vector is a probability vector, the iterates of the power method applied to the Google matrix converge to the PageRank vector. Determining when to stop the iterations requires deciding when an iterate vector is good enough. Existing termination criteria rely on various measures of distance between successive iterate vectors. In this dissertation, we investigate how well a power method iterate vector approximates the PageRank vector, we show that the existing termination criteria do not guarantee accurate ranking, and we provide a computationally efficient criterion for determining relative rankings, exact rankings, and ranking intervals of PageRank scores.

[1]  東京理科大学経営学部,et al.  A total ranking based on examination scores for a small number of subjects , 2002 .

[2]  Ilse C. F. Ipsen,et al.  Convergence Analysis of a PageRank Updating Algorithm by Langville and Meyer , 2005, SIAM J. Matrix Anal. Appl..

[3]  C. F. Kossack,et al.  Rank Correlation Methods , 1949 .

[4]  Allison Woodruff,et al.  An Investigation of Documents from the World Wide Web , 1996, Comput. Networks.

[5]  Ben Gerson The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture , 2005 .

[6]  Rebecca S. Wills Google’s pagerank , 2006 .

[7]  T. Saaty Rank According to Perron: A New Insight , 1987 .

[8]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[9]  Inderjit S. Dhillon,et al.  Estimating the global pagerank of web communities , 2006, KDD '06.

[10]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[11]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[12]  David J. DeWitt,et al.  Computing PageRank in a Distributed Internet Search Engine System , 2004, VLDB.

[13]  Jasmine Novak,et al.  PageRank Computation and the Structure of the Web: Experiments and Algorithms , 2002 .

[14]  Ricardo A. Baeza-Yates,et al.  Generalizing PageRank: damping functions for link-based ranking algorithms , 2006, SIGIR.

[15]  Hans Schneider,et al.  Inverse M-Matrix Inequalities and Generalized Ultrametric Matrices , 1995 .

[16]  C. D. Meyer,et al.  Generalized inverses of linear transformations , 1979 .

[17]  James P. Keener,et al.  The Perron-Frobenius Theorem and the Ranking of Football Teams , 1993, SIAM Rev..

[18]  Claude Brezinski,et al.  The PageRank Vector: Properties, Computation, Approximation, and Acceleration , 2006, SIAM J. Matrix Anal. Appl..

[19]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[20]  Ilse C. F. Ipsen,et al.  Mathematical properties and analysis of Google's PageRank , 2008 .

[21]  G. Golub,et al.  An Arnoldi-type algorithm for computing page rank , 2006 .

[22]  Of references. , 1966, JAMA.

[23]  Amy Nicole Langville,et al.  A Reordering for the PageRank Problem , 2005, SIAM J. Sci. Comput..

[24]  G. Golub,et al.  A Fast Two-Stage Algorithm for Computing PageRank , 2003 .

[25]  Andrei Z. Broder,et al.  Efficient pagerank approximation via graph aggregation , 2004, WWW Alt. '04.

[26]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[27]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[28]  L. Eldén A Note on the Eigenvalues of the Google Matrix , 2004, math/0401177.

[29]  Padhraic Smyth,et al.  Algorithms for estimating relative importance in networks , 2003, KDD '03.

[30]  Amy Nicole Langville,et al.  Updating Markov Chains with an Eye on Google's PageRank , 2005, SIAM J. Matrix Anal. Appl..

[31]  L. V. D. Heyden,et al.  Perturbation bounds for the stationary probabilities of a finite Markov chain , 1984 .

[32]  András A. Benczúr,et al.  To randomize or not to randomize: space optimal summaries for hyperlink analysis , 2006, WWW '06.

[33]  Taher H. Haveliwala,et al.  The Second Eigenvalue of the Google Matrix , 2003 .

[34]  Feng Qiu,et al.  Automatic identification of user interest for personalized search , 2006, WWW '06.

[35]  E. Seneta Non-negative Matrices and Markov Chains , 2008 .

[36]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[37]  Taher H. Haveliwala Efficient Computation of PageRank , 1999 .

[38]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[39]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[40]  Bryan L. Shader,et al.  Applications of Paz's inequality to perturbation bounds for Markov chains , 1998 .

[41]  Chris H. Q. Ding,et al.  PageRank, HITS and a unified framework for link analysis , 2002, SIGIR '02.

[42]  Giora Slutzki,et al.  Scoring of web pages and tournaments—axiomatizations , 2006, Soc. Choice Welf..

[43]  Claude Brezinski,et al.  Extrapolation methods for PageRank computations , 2005 .

[44]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[45]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[46]  Ilse C. F. Ipsen,et al.  PageRank Computation, with Special Attention to Dangling Nodes , 2007, SIAM J. Matrix Anal. Appl..

[47]  Sebastiano Vigna,et al.  PageRank as a function of the damping factor , 2005, WWW '05.

[48]  Gene H. Golub,et al.  Computing PageRank using Power Extrapolation , 2003 .

[49]  Sebastiano Vigna Stanford Matrix Considered Harmful , 2007, Web Information Retrieval and Linear Algebra Algorithms.

[50]  Steve Kirkland Conditioning of the entries in the stationary vector of a Google-type matrix , 2006 .

[51]  William J. Stewart,et al.  Introduction to the numerical solution of Markov Chains , 1994 .

[52]  David F. Gleich,et al.  Fast Parallel PageRank: A Linear System Approach , 2004 .

[53]  Stefano Serra Capizzano Jordan Canonical Form of the Google Matrix: A Potential Contribution to the PageRank Computation , 2005, SIAM J. Matrix Anal. Appl..

[54]  E. Barbeau Perron's Result and a Decision on Admissions Tests , 1986 .

[55]  Gene H. Golub,et al.  Matrix computations , 1983 .

[56]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[57]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[58]  C. D. Meyer,et al.  Comparison of perturbation bounds for the stationary distribution of a Markov chain , 2001 .

[59]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[60]  R. Vandebril,et al.  Numerical Linear Algebra Internet and Large Scale Applications , 2022 .