Exploiting the Block Structure of the Web for Computing

The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3-stage algorithm whereby (1)~the local PageRanks of pages for each host are computed independently using the link structure of that host, (2)~these local PageRanks are then weighted by the ``importance'' of the corresponding host, and (3)~the standard PageRank algorithm is then run using as its starting vector the weighted aggregate of the local PageRanks. Empirically, this algorithm speeds up the computation of PageRank by a factor of 2 in realistic scenarios. Further, we develop a variant of this algorithm that efficiently computes many different ``personalized'' PageRanks, and a variant that efficiently recomputes PageRank after node updates.

[1]  Herbert A. Simon,et al.  Aggregation of Variables in Dynamic Systems , 1961 .

[2]  G. Grimmett,et al.  Probability and random processes , 2002 .

[3]  Gene H. Golub,et al.  Matrix computations , 1983 .

[4]  G. Stewart,et al.  On a Rayleigh-Ritz refinement technique for nearly uncoupled stochastic matrices , 1984 .

[5]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[6]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[7]  Sriram Raghavan,et al.  WebBase: a repository of Web pages , 2000, Comput. Networks.

[8]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[9]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[10]  Krishna Bharat,et al.  Who links to whom: mining linkage between Web sites , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Hector Garcia-Molina,et al.  Parallel crawlers , 2002, WWW.

[12]  Jasmine Novak,et al.  PageRank Computation and the Structure of the Web: Experiments and Algorithms , 2002 .

[13]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[14]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[15]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[16]  Sriram Raghavan,et al.  Representing Web graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).