The content and access dynamics of a busy Web server (poster)

In this paper, we study the dynamics of one of the busiest Web sites in the Internet today. Unlike many other efforts that have analyzed client accesses as seen by proxies, we focus on the server end. We analyze the dynamics of both the server content and client accesses made to the server. The former considers the content creation and modification process while the latter considers page popularity and locality in client accesses. Some of our key results are: (a) files tend to change little when they are modified, (b) a small set of files tends to get modified repeatedly, (c) file popularity follows a Zipf-like distribution with a parameter α that is much larger than reported in previous, proxy-based studies, and (d) there is significant temporal stability in file popularity but not much stability in the domains from which clients access the popular content. We discuss the implications of these findings for techniques such as Web caching (including cache consistency algorithms) and prefetching or serverbased “push” of Web content.

[1]  V. Rich Personal communication , 1989, Nature.

[2]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[3]  Virgílio A. F. Almeida,et al.  Characterizing reference locality in the WWW , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[4]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[5]  Jeffrey C. Mogul,et al.  Using predictive prefetching to improve World Wide Web latency , 1996, CCRV.

[6]  Margo I. Seltzer,et al.  World Wide Web Cache Consistency , 1996, USENIX Annual Technical Conference.

[7]  Eric A. Brewer,et al.  System Design Issues for Internet Middleware Services: Deductions from a Large Client Trace , 1997, USENIX Symposium on Internet Technologies and Systems.

[8]  Anja Feldmann,et al.  Potential benefits of delta encoding and data compression for HTTP , 1997, SIGCOMM '97.

[9]  Anja Feldmann,et al.  Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.

[10]  Edith Cohen,et al.  Improving end-to-end performance of the Web using server volumes and proxy filters , 1998, SIGCOMM '98.

[11]  Ophir Frieder,et al.  Information Retrieval: Algorithms and Heuristics , 1998 .

[12]  David R. Cheriton,et al.  OTERS (on-tree efficient recovery using subcasting): a reliable multicast protocol , 1998, Proceedings Sixth International Conference on Network Protocols (Cat. No.98TB100256).

[13]  Joseph D. Touch,et al.  LSAM Proxy Cache: A Multicast Distributed Virtual Cache , 1998, Comput. Networks.

[14]  Anja Feldmann,et al.  Performance of Web proxy caching in heterogeneous bandwidth environments , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[15]  Hongsuda Tangmunarunkit,et al.  Scaling of multicast trees: comments on the Chuang-Sirbu scaling law , 1999, SIGCOMM '99.

[16]  William LeFebvre,et al.  Rapid Reverse DNS Lookups for Web Servers , 1999, USENIX Symposium on Internet Technologies and Systems.

[17]  G. Broll,et al.  Microsoft Corporation , 1999 .

[18]  Wei Lin,et al.  Web prefetching between low-bandwidth clients and proxies: potential and performance , 1999, SIGMETRICS '99.

[19]  David R. Cheriton,et al.  Scalable Web Caching of Frequently Updated Objects Using Reliable Multicast , 1999, USENIX Symposium on Internet Technologies and Systems.

[20]  Scott Shenker,et al.  A scalable Web cache consistency architecture , 1999, SIGCOMM '99.

[21]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[22]  Alec Wolman,et al.  Organization-Based Analysis of Web-Object Sharing and Caching , 1999, USENIX Symposium on Internet Technologies and Systems.

[23]  Jeffrey C. Mogul,et al.  Network Behavior of a Busy Web Server and its Clients , 1999 .

[24]  Jin Zhang,et al.  Active Cache: caching dynamic contents on the Web , 1999, Distributed Syst. Eng..

[25]  G. Voelker,et al.  On the scale and performance of cooperative Web proxy caching , 2000, OPSR.

[26]  Venkata N. Padmanabhan,et al.  The content and access dynamics of a busy web site: findings and implicatins , 2000, SIGCOMM.

[27]  Amin Vahdat,et al.  Active Names: flexible location and transport of wide-area resources , 1999, Proceedings DARPA Active Networks Conference and Exposition.