Web Observation from a User Perspective

How many pages on the Web will be accessed by Web users? This is an interesting question for both Web scientists and industry engineers. To answer this question, User Access Web (UA Web) is described and studied in this paper. With analysis on large scale Web users’ access logs, a sampling procedure is proposed to reduce the bias, and the near-uniform random pages are sampled from the UA Web applying search engine interface and Monte Carlo methods. Experimental results on about 675 million user log entries reveal some properties of the UA Web and the indices of four search engines, e.g. power law distribution, average length of pages, index size of search engines, properties of static and dynamic pages, etc.