Efficient pagerank approximation via graph aggregation

We present a framework for approximating random-walk based probability distributions over Web pages using graph aggregation. We (1) partition the Web's graph into classes of quasi-equivalent vertices, (2) project the page-based random walk to be approximated onto those classes, and (3) compute the stationary probability distribution of the resulting class-based random walk. From this distribution we can quickly reconstruct a distribution on pages. Inparticular, our framework can approximate the well-known PageRank distribution by setting the classes according to the set of pages on each Web host. We experimented on a Web-graph containing over 1.4 billion pages, and were able to produce a ranking that has Spearman rank-order correlation of 0.95 with respect to PageRank. A simplistic implementation of our method required less than half the running time of a highly optimized implementation of PageRank, implying that larger speedup factors are probably possible.

[1]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[2]  Ah Chung Tsoi,et al.  Adaptive ranking of web pages , 2003, WWW '03.

[3]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[4]  Robert G. Gallager,et al.  Discrete Stochastic Processes , 1995 .

[5]  Taher H. Haveliwala Efficient Computation of PageRank , 1999 .

[6]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[7]  Marc Najork,et al.  Measuring Index Quality Using Random Walks on the Web , 1999, Comput. Networks.

[8]  Andrei Z. Broder,et al.  The Connectivity Server: Fast Access to Linkage Information on the Web , 1998, Comput. Networks.

[9]  Maxim Lifantsev Voting Model for Ranking Web Pages , 2000, International Conference on Internet Computing.

[10]  Krishna Bharat,et al.  Who links to whom: mining linkage between Web sites , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Torsten Suel,et al.  I/O-efficient techniques for computing pagerank , 2002, CIKM '02.

[12]  Steve Chien,et al.  Link Evolution: Analysis and Algorithms , 2004, Internet Math..

[13]  Ziv Bar-Yossef,et al.  Template detection via data mining and its applications , 2002, WWW.

[14]  David Hawking,et al.  Predicting Fame and Fortune: PageRank or Indegree? , 2003 .

[15]  John A. Tomlin,et al.  A new paradigm for ranking pages on the world wide web , 2003, WWW '03.

[16]  Alan Jennings,et al.  Matrix Computation for Engineers and Scientists , 1977 .

[17]  G. W. Snedecor Statistical Methods , 1964 .

[18]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[19]  Serge Abiteboul,et al.  Adaptive on-line page importance computation , 2003, WWW '03.

[20]  Sepandar D. Kamvar,et al.  An Analytical Comparison of Approaches to Personalizing PageRank , 2003 .

[21]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[22]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[23]  David Carmel,et al.  The connectivity sonar: detecting site functionality by structural patterns , 2003, HYPERTEXT '03.

[24]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[25]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[26]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[27]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[28]  Eli Upfal,et al.  Using PageRank to Characterize Web Structure , 2002, COCOON.

[29]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[30]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[31]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.