Using PageRank to Characterize Web Structure

Recent work on modeling the Web graph has dwelt on capturing the degree distributions observed on the Web. Pointing out that this represents a heavy reliance on "local" properties of the Web graph, we study the distribution of PageRank values (used in the Google search engine) on the Web. This distribution is of independent interest in optimizing search indices and storage. We show that PageRank values on the Web follow a power law. We then develop detailed models for the Web graph that explain this observation, and moreover remain faithful to previously studied degree distributions. We analyze these models, and compare the analyses to both snapshots from the Web and to graphs generated by simulations on the new models. To our knowledge this represents the first modeling of the Web that goes beyond fitting degree distributions on the Web.

[1]  Béla Bollobás,et al.  Random Graphs , 1985 .

[2]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[3]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4]  Reka Albert,et al.  Mean-field theory for scale-free random networks , 1999 .

[5]  Alan M. Frieze,et al.  A general model of web graphs , 2003, Random Struct. Algorithms.

[6]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[7]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[8]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[9]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[10]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[11]  Monika Henzinger,et al.  Algorithmic Challenges in Web Search Engines , 2004, Internet Math..

[12]  Béla Bollobás,et al.  The degree sequence of a scale‐free random graph process , 2001, Random Struct. Algorithms.

[13]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[14]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[15]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.

[16]  Mihaela Enachescu,et al.  Variations on Random Graph Models for the Web , 2001 .

[17]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[18]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[19]  Ravi Kumar,et al.  Self-similarity in the web , 2001, TOIT.

[20]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[21]  Mark Levene,et al.  A stochastic model for the evolution of the Web , 2002, Comput. Networks.

[22]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .