Crawling on web graphs

We consider a simple model of an agent (which we call a spider) moving between the nodes of a randomly growing web graph. It is presumed that the agent examines the page content of the node for some specific topic. In our model the spider makes a random walk on the existing set of vertices. We compare the success of the spider on web graphs of two distinct types. For a random graph web graph model, in which new vertices join edges to existing vertices uniformly at random, the expected proportion of unvisited vertices tends to 0.57. For the comparable copy-based web graph model, in which new vertices join edges to existing vertices proportional to vertex degree, the expected proportion of unvisited vertices tends to 0.59. A web graph is a sparse connected graph designed to capture some properties of the www. Studies of the graph structure of the www were made by [4] and [7] among others. There are many models of web graphs designed to capture the structure of the www found in the studies given above. For example see references [1], [2], [3], [5], [6], [8], [9], [10], [12] and [13] for various models. In the simple models we consider, each new vertex directs m edges towards existing vertices, either randomly (random graph model) or according to the degree of existing vertices (copy model). Once a vertex has been added the direction of the edges is ignored. There are several types of search which might be applied to the www. Complete searches of the web, usually in a breadth first manner, are carried out by search engines. Link and page data for visited pages is stored, and from the link

[1]  P. Flajolet On approximate counting , 1982 .

[2]  Mihaela Enachescu,et al.  Variations on Random Graph Models for the Web , 2001 .

[3]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[4]  Amos Fiat,et al.  Web search via hub synthesis , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[5]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[6]  Béla Bollobás,et al.  The degree sequence of a scale‐free random graph process , 2001, Random Struct. Algorithms.

[7]  Alan M. Frieze,et al.  A general model of web graphs , 2003, Random Struct. Algorithms.

[8]  Alan M. Frieze,et al.  A General Model of Undirected Web Graphs , 2001, ESA.

[9]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[10]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[11]  Mark Jerrum,et al.  Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains , 1987, WG.

[12]  Fan Chung Graham,et al.  Random evolution in massive graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[13]  Marc Najork,et al.  Measuring Index Quality Using Random Walks on the Web , 1999, Comput. Networks.

[14]  Eli Upfal,et al.  The Web as a graph , 2000, PODS.

[15]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[16]  Béla Bollobás,et al.  The Diameter of a Scale-Free Random Graph , 2004, Comb..