Stochastic models for the Web graph

The Web may be viewed as a directed graph each of whose vertices is a static HTML Web page, and each of whose edges corresponds to a hyperlink from one Web page to another. We propose and analyze random graph models inspired by a series of empirical observations on the Web. Our graph models differ from the traditional G/sub n,p/ models in two ways: 1. Independently chosen edges do not result in the statistics (degree distributions, clique multitudes) observed on the Web. Thus, edges in our model are statistically dependent on each other. 2. Our model introduces new vertices in the graph as time evolves. This captures the fact that the Web is changing with time. Our results are two fold: we show that graphs generated using our model exhibit the statistics observed on the Web graph, and additionally, that natural graph models proposed earlier do not exhibit them. This remains true even when these earlier models are generalized to account for the arrival of vertices over time. In particular, the sparse random graphs in our models exhibit properties that do not arise in far denser random graphs generated by Erdos-Renyi models.

[1]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[2]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[3]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[4]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[5]  William Feller,et al.  An Introduction to Probability Theory and Its Applications. I , 1951, The Mathematical Gazette.

[6]  H. Simon,et al.  ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .

[7]  E. Garfield Citation analysis as a tool in journal evaluation. , 1972, Science.

[8]  Russell C. Coile,et al.  Lotka's frequency distribution of scientific productivity , 1977, J. Am. Soc. Inf. Sci..

[9]  Béla Bollobás,et al.  Random Graphs , 1985 .

[10]  Jean Tague-Sutcliffe,et al.  An Introduction to Informetrics , 1992, Inf. Process. Manag..

[11]  Ray R. Larson,et al.  Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace , 1996 .

[12]  N. Gilbert A Simulation of the Structure of Academic Science , 1997 .

[13]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[14]  Ravi Kumar,et al.  Extracting Large-Scale Knowledge Bases from the Web , 1999, VLDB.

[15]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[16]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[17]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[18]  Boris G. Pittel,et al.  On a random graph with immigrating vertices: Emergence of the giant component , 2000, Random Struct. Algorithms.