Efficient PageRank approximation via graph aggregation

We present a framework for approximating random-walk based probability distributions over Web pages using graph aggregation. The basic idea is to partition the graph into classes of quasi-equivalent vertices, to project the page-based random walk to be approximated onto those classes, and to compute the stationary probability distribution of the resulting class-based random walk. From this distribution we can quickly reconstruct a distribution on pages. In particular, our framework can approximate the well-known PageRank distribution by setting the classes according to the set of pages on each Web host.We experimented on a Web-graph containing over 1.4 billion pages and over 6.6 billion links from a crawl of the Web conducted by AltaVista in September 2003. We were able to produce a ranking that has Spearman rank-order correlation of 0.95 with respect to PageRank. The clock time required by a simplistic implementation of our method was less than half the time required by a highly optimized implementation of PageRank, implying that larger speedup factors are probably possible.

[1]  Andrei Z. Broder,et al.  The Connectivity Server: Fast Access to Linkage Information on the Web , 1998, Comput. Networks.

[2]  B. U. Kannappanavar,et al.  Information and Knowledge Management , 2007 .

[3]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[4]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[5]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[6]  Maxim Lifantsev Voting Model for Ranking Web Pages , 2000, International Conference on Internet Computing.

[7]  Ziv Bar-Yossef,et al.  Template detection via data mining and its applications , 2002, WWW.

[8]  Krishna Bharat,et al.  Who links to whom: mining linkage between Web sites , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9]  Torsten Suel,et al.  I/O-efficient techniques for computing pagerank , 2002, CIKM '02.

[10]  Steve Chien,et al.  Link Evolution: Analysis and Algorithms , 2004, Internet Math..

[11]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[12]  David Hawking,et al.  Predicting Fame and Fortune: PageRank or Indegree? , 2003 .

[13]  John A. Tomlin,et al.  A new paradigm for ranking pages on the world wide web , 2003, WWW '03.

[14]  G. W. Snedecor Statistical Methods , 1964 .

[15]  David Carmel,et al.  The connectivity sonar: detecting site functionality by structural patterns , 2003, HYPERTEXT '03.

[16]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[17]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[18]  Serge Abiteboul,et al.  Adaptive on-line page importance computation , 2003, WWW '03.

[19]  Eli Upfal,et al.  Using PageRank to Characterize Web Structure , 2002, COCOON.

[20]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[21]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[22]  Ah Chung Tsoi,et al.  Adaptive ranking of web pages , 2003, WWW '03.

[23]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[24]  Marc Najork,et al.  Measuring Index Quality Using Random Walks on the Web , 1999, Comput. Networks.

[25]  J. Sproston Matrix computation for engineers and scientists , 1980 .

[26]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[27]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[28]  Shlomo Moran,et al.  Rank stability and rank similarity of web link-based ranking algorithms , 2001 .

[29]  Sepandar D. Kamvar,et al.  An Analytical Comparison of Approaches to Personalizing PageRank , 2003 .

[30]  Robert G. Gallager,et al.  Discrete Stochastic Processes , 1995 .

[31]  Taher H. Haveliwala Efficient Computation of PageRank , 1999 .

[32]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[33]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.