论文信息 - Efficient pagerank approximation via graph aggregation

Efficient pagerank approximation via graph aggregation

We present a framework for approximating random-walk based probability distributions over Web pages using graph aggregation. We (1) partition the Web's graph into classes of quasi-equivalent vertices, (2) project the page-based random walk to be approximated onto those classes, and (3) compute the stationary probability distribution of the resulting class-based random walk. From this distribution we can quickly reconstruct a distribution on pages. Inparticular, our framework can approximate the well-known PageRank distribution by setting the classes according to the set of pages on each Web host. We experimented on a Web-graph containing over 1.4 billion pages, and were able to produce a ranking that has Spearman rank-order correlation of 0.95 with respect to PageRank. A simplistic implementation of our method required less than half the running time of a highly optimized implementation of PageRank, implying that larger speedup factors are probably possible.

Andrei Z. Broder | Farzin Maghoul | Jan O. Pedersen | Ronny Lempel

[1] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[2] Ah Chung Tsoi,et al. Adaptive ranking of web pages , 2003, WWW '03.

[3] Andrei Z. Broder,et al. Graph structure in the Web , 2000, Comput. Networks.

[4] Robert G. Gallager,et al. Discrete Stochastic Processes , 1995 .

[5] Taher H. Haveliwala. Efficient Computation of PageRank , 1999 .

[6] Gene H. Golub,et al. Exploiting the Block Structure of the Web for Computing , 2003 .

[7] Marc Najork,et al. Measuring Index Quality Using Random Walks on the Web , 1999, Comput. Networks.

[8] Andrei Z. Broder,et al. The Connectivity Server: Fast Access to Linkage Information on the Web , 1998, Comput. Networks.

[9] Maxim Lifantsev. Voting Model for Ranking Web Pages , 2000, International Conference on Internet Computing.

[10] Krishna Bharat,et al. Who links to whom: mining linkage between Web sites , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11] Torsten Suel,et al. I/O-efficient techniques for computing pagerank , 2002, CIKM '02.

[12] Steve Chien,et al. Link Evolution: Analysis and Algorithms , 2004, Internet Math..

[13] Ziv Bar-Yossef,et al. Template detection via data mining and its applications , 2002, WWW.

[14] David Hawking,et al. Predicting Fame and Fortune: PageRank or Indegree? , 2003 .

[15] John A. Tomlin,et al. A new paradigm for ranking pages on the world wide web , 2003, WWW '03.

[16] Alan Jennings,et al. Matrix Computation for Engineers and Scientists , 1977 .

[17] G. W. Snedecor. Statistical Methods , 1964 .

[18] Michael I. Jordan,et al. Stable algorithms for link analysis , 2001, SIGIR '01.

[19] Serge Abiteboul,et al. Adaptive on-line page importance computation , 2003, WWW '03.

[20] Sepandar D. Kamvar,et al. An Analytical Comparison of Approaches to Personalizing PageRank , 2003 .

[21] Albert,et al. Emergence of scaling in random networks , 1999, Science.

[22] Andrei Broder,et al. A taxonomy of web search , 2002, SIGF.

[23] David Carmel,et al. The connectivity sonar: detecting site functionality by structural patterns , 2003, HYPERTEXT '03.

[24] Taher H. Haveliwala. Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[25] Taher H. Haveliwala,et al. Adaptive methods for the computation of PageRank , 2004 .

[26] Hector Garcia-Molina,et al. Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[27] Gene H. Golub,et al. Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[28] Eli Upfal,et al. Using PageRank to Characterize Web Structure , 2002, COCOON.

[29] Jennifer Widom,et al. Scaling personalized web search , 2003, WWW '03.

[30] Matthew Richardson,et al. The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[31] Jon M. Kleinberg,et al. The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.