Ordinal Ranking for Google's PageRank

We present computationally efficient criteria that can guarantee correct ordinal ranking of Google's PageRank scores when they are computed with the power method (ordinal ranking of a list consists of assigning an ordinal number to each item in the list). We discuss the tightness of the ranking criteria, and illustrate their effectiveness for top k and bucket ranking. We present a careful implementation of the power method, combined with a roundoff error analysis that is valid for matrix dimensions $n<10^{14}$. To first order, the roundoff error depends neither on $n$ nor on the iteration count, but only on the maximal number of inlinks and the dangling nodes. The applicability of our ranking criterion is limited by the roundoff error from a single matrix vector multiply. Numerical experiments suggest that our criteria can effectively rank the top PageRank scores. We also discuss how to implement ranking for extremely large practical problems, by curbing roundoff error, reducing the matrix dimension, and using faster converging methods.

[1]  Steve Kirkland Conditioning of the entries in the stationary vector of a Google-type matrix , 2006 .

[2]  A. Langville,et al.  THE FIVE GREATEST APPLICATIONS OF MARKOV CHAINS , 2006 .

[3]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[4]  Sergei Maslov,et al.  Finding scientific gems with Google's PageRank algorithm , 2006, J. Informetrics.

[5]  Franco Scarselli,et al.  Inside PageRank , 2005, TOIT.

[6]  L. V. D. Heyden,et al.  Perturbation bounds for the stationary probabilities of a finite Markov chain , 1984 .

[7]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[8]  L. Eldén A Note on the Eigenvalues of the Google Matrix , 2004, math/0401177.

[9]  András A. Benczúr,et al.  To randomize or not to randomize: space optimal summaries for hyperlink analysis , 2006, WWW '06.

[10]  Rada Mihalcea,et al.  PageRank on Semantic Networks, with Application to Word Sense Disambiguation , 2004, COLING.

[11]  Inderjit S. Dhillon,et al.  Estimating the global pagerank of web communities , 2006, KDD '06.

[12]  T. Griffiths,et al.  Google and the Mind , 2007, Psychological science.

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Ilse C. F. Ipsen,et al.  Mathematical properties and analysis of Google's PageRank , 2008 .

[15]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[16]  Siegfried M. Rump,et al.  Accurate Sum and Dot Product , 2005, SIAM J. Sci. Comput..

[17]  J. Gillis,et al.  Matrix Iterative Analysis , 1961 .

[18]  Allan Borodin,et al.  Link analysis ranking: algorithms, theory, and experiments , 2005, TOIT.

[19]  Stefano Serra Capizzano Jordan Canonical Form of the Google Matrix: A Potential Contribution to the PageRank Computation , 2005, SIAM J. Matrix Anal. Appl..

[20]  Amy Nicole Langville,et al.  A Survey of Eigenvector Methods for Web Information Retrieval , 2005, SIAM Rev..

[21]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[22]  G. W. Stewart,et al.  On the powers of a matrix with perturbations , 2003, Numerische Mathematik.

[23]  Ilse C. F. Ipsen,et al.  PageRank Computation, with Special Attention to Dangling Nodes , 2007, SIAM J. Matrix Anal. Appl..

[24]  Claude Brezinski,et al.  The PageRank Vector: Properties, Computation, Approximation, and Acceleration , 2006, SIAM J. Matrix Anal. Appl..

[25]  Ilse C. F. Ipsen,et al.  When rank trumps precision: using the power method to compute google's pagerank , 2007 .

[26]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[27]  Fan Chung Graham,et al.  Local Partitioning for Directed Graphs Using PageRank , 2007, WAW.

[28]  Cong Wang,et al.  Keyword Extraction Based on PageRank , 2007, PAKDD.

[29]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[30]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[31]  Armin B. Cremers,et al.  Beyond the Web: Retrieval in Social Information Spaces , 2006, ECIR.

[32]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[33]  David F. Gleich,et al.  Fast Parallel PageRank: A Linear System Approach , 2004 .

[34]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[35]  Ricardo A. Baeza-Yates,et al.  Generalizing PageRank: damping functions for link-based ranking algorithms , 2006, SIGIR.

[36]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[37]  Taher H. Haveliwala,et al.  The Second Eigenvalue of the Google Matrix , 2003 .

[38]  Allan Borodin,et al.  Link analysis ranking , 2004 .

[39]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[40]  Johan Bollen,et al.  Refining dermatology journal impact factors using PageRank. , 2007, Journal of the American Academy of Dermatology.

[41]  Feng Qiu,et al.  Automatic identification of user interest for personalized search , 2006, WWW '06.

[42]  G. Golub,et al.  An Arnoldi-type algorithm for computing page rank , 2006 .

[43]  Andrea Esuli,et al.  PageRanking WordNet Synsets: An Application to Opinion Mining , 2007, ACL.