Link Evolution: Analysis and Algorithms

We anticipate that future web search techniques will exploit changes in web structure and content. As a first step in this direction, we examine the problem of integrating observed changes in link structure into static hyperlink-based ranking computations. We present a very efficient algorithm to incrementally compute good approximations to Google's PageRank [Brin and Page 98], as links evolve. Our experiments reveal that this algorithm is both fast and yields excellent approximations to PageRank, even in light of large changes to the link structure. Our algorithm derives intuition and partial justification from a rigorous sensitivity analysis of Markov chains. Consider a regular Markov chain with stationary probability π, and suppose the transition probability into a state j is increased. We prove that this can only cause • πj to increase–adding a link to a site can only cause the stationary probability of the target site to increase; • the rank of j to improve–if the states are ordered according to their stationary probabilities, then adding a link to a site can only cause the rank of the target site to improve. This analysis formalizes why the intuition that drives Google never fails.

[1]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[2]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[3]  J. Meyer The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains , 1975 .

[4]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[5]  C. D. Meyer,et al.  Markov chain sensitivity measured by mean first passage times , 2000 .

[6]  William J. Stewart,et al.  Introduction to the numerical solution of Markov Chains , 1994 .

[7]  Carl D. Meyer,et al.  Updating pagerank using the group inverse and stochastic complementation , 2002 .

[8]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[9]  Ravi Kumar,et al.  Self-similarity in the web , 2001, TOIT.

[10]  Michael I. Jordan,et al.  Link Analysis, Eigenvectors and Stability , 2001, IJCAI.

[11]  C. D. Meyer,et al.  Comparison of perturbation bounds for the stationary distribution of a Markov chain , 2001 .

[12]  Ilse C. F. Ipsen,et al.  Uniform Stability of Markov Chains , 1994, SIAM J. Matrix Anal. Appl..

[13]  Raymie Stata,et al.  The Link Database: fast access to graphs of the Web , 2002, Proceedings DCC 2002. Data Compression Conference.

[14]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[15]  Marc Najork,et al.  Mercator: A scalable, extensible Web crawler , 1999, World Wide Web.

[16]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[17]  Steven K. Donoho,et al.  Link Analysis , 2005, Data Mining and Knowledge Discovery Handbook.

[18]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[19]  P. Schweitzer Perturbation theory and finite Markov chains , 1968 .

[20]  B. Nordstrom FINITE MARKOV CHAINS , 2005 .

[21]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.