Google's PageRank and beyond - the science of search engine rankings

Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of web page rankings, Google's PageRank and Beyond supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. Many illustrative examples and entertaining asides MATLAB code Accessible and informal style Complete and self-contained section for mathematics review

[1]  Craig Silverstein,et al.  Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[2]  Michael W. Berry,et al.  Understanding search engines: mathematical modeling and text retrieval (software , 1999 .

[3]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[4]  John A. Tomlin,et al.  A new paradigm for ranking pages on the world wide web , 2003, WWW '03.

[5]  Marco Gori,et al.  Web page scoring systems for horizontal and vertical search , 2002, WWW.

[6]  Konstantin Avrachenkov,et al.  The Effect of New Links on Google Pagerank , 2006 .

[7]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[8]  Rajeev Motwani,et al.  What can you do with a Web in your Pocket? , 1998, IEEE Data Eng. Bull..

[9]  Sebastiano Vigna,et al.  PageRank as a function of the damping factor , 2005, WWW '05.

[10]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[11]  Yi Liu,et al.  The powerrank web link analysis algorithm , 2004, WWW Alt. '04.

[12]  Michael I. Jordan,et al.  Link Analysis, Eigenvectors and Stability , 2001, IJCAI.

[13]  Debora Donato,et al.  Large scale properties of the Webgraph , 2004 .

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[16]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[17]  Rael Dornfest,et al.  Google hacks - 100 industrial-strength tips and tools , 2002 .

[18]  Francesco Romani,et al.  Exploiting Web matrix permutations to speedup PageRank computation , 2004 .

[19]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[20]  Sebastiano Vigna,et al.  The Webgraph framework II: codes for the World-Wide Web , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[21]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[22]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[23]  M. Haviv Aggregation/disaggregation methods for computing the stationary distribution of a Markov chain , 1987 .

[24]  B. Nordstrom FINITE MARKOV CHAINS , 2005 .

[25]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[26]  Carl D. Meyer,et al.  On the structure of stochastic matrices with a subdominant eigenvalue near 1 , 1998 .

[27]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[28]  Ayman Farahat,et al.  Authority Rankings from HITS, PageRank, and SALSA: Existence, Uniqueness, and Effect of Initialization , 2005, SIAM J. Sci. Comput..

[29]  C. D. Meyer,et al.  Updating the stationary vector of an irreducible Markov chain , 2002 .

[30]  Cleve B. Moler,et al.  Numerical computing with MATLAB , 2004 .

[31]  Robert R. Korfhage,et al.  Information Storage and Retrieval , 1963 .

[32]  Ronald Fagin,et al.  Searching the workplace web , 2003, WWW '03.

[33]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[34]  Luca Pretto,et al.  A Theoretical Analysis of Google's PageRank , 2002, SPIRE.

[35]  Joel C. Miller,et al.  Modifications of Kleinberg's HITS algorithm using matrix exponentiation and web log records , 2001, SIGIR '01.

[36]  Teh-Hsing Wei,et al.  The algebraic foundations of ranking theory , 1952 .

[37]  Hector Garcia-Molina,et al.  The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.

[38]  Kevin Hemenway,et al.  Spidering hacks - 100 industrial-strength tips and tools , 2003 .

[39]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2003, WWW '03.

[40]  Andrei Z. Broder,et al.  Efficient PageRank approximation via graph aggregation , 2004, WWW Alt. '04.

[41]  Marcin Sydow,et al.  Random surfer with back step , 2004, WWW Alt. '04.

[42]  Sebastiano Vigna,et al.  Do Your Worst to Make the Best: Paradoxical Effects in PageRank Incremental Computations , 2004, WAW.

[43]  Fabien Mathieu,et al.  The effect of the back button in a random walk: application for pagerank , 2004, WWW Alt. '04.

[44]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[45]  Fritz Schneider,et al.  How to Do Everything with Google , 2003 .

[46]  Ilse C. F. Ipsen,et al.  Convergence Analysis of a PageRank Updating Algorithm by Langville and Meyer , 2005, SIAM J. Matrix Anal. Appl..

[47]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[48]  C. D. Meyer,et al.  Sensitivity of the stationary distribution vector for an ergodic Markov chain , 1986 .

[49]  Arnon Rungsawang,et al.  Parallel PageRank computation on a gigabit PC cluster , 2004, 18th International Conference on Advanced Information Networking and Applications, 2004. AINA 2004..

[50]  Konstantin Avrachenkov,et al.  Decomposition of the Google Pagerank and Optimal Linking Strategy Decomposition of the Google Pagerank and Optimal Linking Strategy , 2004 .

[51]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[52]  Jasmine Novak,et al.  PageRank Computation and the Structure of the Web: Experiments and Algorithms , 2002 .

[53]  C. D. Meyer,et al.  Updating finite markov chains by using techniques of group matrix inversion , 1980 .

[54]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[55]  William J. Stewart,et al.  Introduction to the numerical solution of Markov Chains , 1994 .

[56]  Amy Nicole Langville,et al.  A Survey of Eigenvector Methods for Web Information Retrieval , 2005, SIAM Rev..

[57]  Amy Nicole Langville,et al.  A Reordering for the PageRank Problem , 2005, SIAM J. Sci. Comput..

[58]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[59]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[60]  Krishna Bharat,et al.  The Term Vector Database: fast access to indexing terms for Web pages , 2000, Comput. Networks.

[61]  CARL D. MEYER,et al.  The Condition of a Finite Markov Chain and Perturbation Bounds for the Limiting Probabilities , 1980, SIAM J. Algebraic Discret. Methods.

[62]  G. W. Stewart,et al.  Matrix algorithms , 1998 .

[63]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[64]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[65]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[66]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[67]  Ricardo A. Baeza-Yates,et al.  Web page ranking using link attributes , 2004, WWW Alt. '04.

[68]  Herbert A. Simon,et al.  Aggregation of Variables in Dynamic Systems , 1961 .

[69]  Padma Raghavan,et al.  Level search schemes for information filtering and retrieval , 2001, Inf. Process. Manag..

[70]  Taher H. Haveliwala,et al.  The Condition Number of the PageRank Problem , 2003 .

[71]  David F. Gleich,et al.  Fast Parallel PageRank: A Linear System Approach , 2004 .

[72]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[73]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[74]  Pierre-Henri Hugoniot,et al.  The life and work of , 2005 .

[75]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[76]  Taher H. Haveliwala,et al.  The Second Eigenvalue of the Google Matrix , 2003 .

[77]  Alberto O. Mendelzon,et al.  An Autonomous Page Ranking Method for Metasearch Engines , 2002, WWW 2002.

[78]  Andrei Z. Broder,et al.  The Connectivity Server: Fast Access to Linkage Information on the Web , 1998, Comput. Networks.

[79]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[80]  Wei Wu,et al.  Numerical Experiments with Iteration and Aggregation for Markov Chains , 1992, INFORMS J. Comput..

[81]  C. D. Meyer,et al.  Generalized inverses of linear transformations , 1979 .

[82]  Sriram Raghavan,et al.  Representing Web graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[83]  Sepandar D. Kamvar,et al.  An Analytical Comparison of Approaches to Personalizing PageRank , 2003 .

[84]  J. Hunter Stationary distributions of perturbed Markov chains , 1986 .

[85]  Torsten Suel,et al.  I/O-efficient techniques for computing pagerank , 2002, CIKM '02.

[86]  C. D. Meyer,et al.  Comparison of perturbation bounds for the stationary distribution of a Markov chain , 2001 .

[87]  Taher H. Haveliwala Efficient Encodings for Document Ranking Vectors (Extended Abstract) , 2003, International Conference on Internet Computing.

[88]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[89]  Ilse C. F. Ipsen,et al.  Uniform Stability of Markov Chains , 1994, SIAM J. Matrix Anal. Appl..

[90]  James P. Keener,et al.  The Perron-Frobenius Theorem and the Ranking of Football Teams , 1993, SIAM Rev..

[91]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[92]  C. D. Meyer,et al.  Derivatives and perturbations of eigenvectors , 1988 .

[93]  Shlomo Moran,et al.  Rank-Stability and Rank-Similarity of Link-Based Web Ranking Algorithms in Authority-Connected Graphs , 2005, Information Retrieval.

[94]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[95]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[96]  Taher H. Haveliwala Efficient Computation of PageRank , 1999 .

[97]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[98]  Carl D. Meyer,et al.  Stochastic Complementation, Uncoupling Markov Chains, and the Theory of Nearly Reducible Systems , 1989, SIAM Rev..

[99]  Anna R. Karlin,et al.  Random walks with `back buttons' , 2001, STOC 2000.

[100]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[101]  Robert J. Plemmons,et al.  Linear algebra, Markov chains, and queueing models , 1993 .

[102]  Franco Scarselli,et al.  PageRank: A Circuital Analysis , 2002 .

[103]  Chris H. Q. Ding,et al.  PageRank, HITS and a unified framework for link analysis , 2002, SIGIR '02.

[104]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[105]  J. Borges The Library of Babel , 1941 .

[106]  P. Moran On the method of paired comparisons. , 1947, Biometrika.

[107]  Michael W. Berry,et al.  Computational information retrieval , 2001 .

[108]  Andrei Z. Broder,et al.  Sic transit gloria telae: towards an understanding of the web's decay , 2004, WWW '04.

[109]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[110]  R. Plemmons,et al.  Updating LU factorizations for computing stationary distributions , 1986 .

[111]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[112]  J. Meyer The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains , 1975 .

[113]  Kristen Thorson April Modeling the Web and the Computation of PageRank , 2004 .

[114]  Ramesh Govindan,et al.  Making Eigenvector-Based Reputation Systems Robust to Collusion , 2004, WAW.

[115]  Robert Cailliau,et al.  How the Web Was Born: The Story of the World Wide Web , 2000 .

[116]  Eytan Adar,et al.  Implicit Structure and the Dynamics of Blogspace , 2004 .

[117]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[118]  GhemawatSanjay,et al.  The Google file system , 2003 .

[119]  C. D. Meyer,et al.  Using the QR factorization and group inversion to compute, differentiate ,and estimate the sensitivity of stationary probabilities for markov chains , 1986 .

[120]  C. D. Meyer Sensitivity of the Stationary Distribution of a Markov Chain , 1994, SIAM J. Matrix Anal. Appl..

[121]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[122]  C. D. Meyer,et al.  Markov chain sensitivity measured by mean first passage times , 2000 .

[123]  Torsten Suel,et al.  Compressing the graph structure of the Web , 2001, Proceedings DCC 2001. Data Compression Conference.

[124]  Eli Upfal,et al.  Using PageRank to Characterize Web Structure , 2002, Internet Math..

[125]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[126]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[127]  Chris H. Q. Ding,et al.  Link Analysis: Hubs and Authorities on the World Wide Web , 2004, SIAM Rev..