The content and access dynamics of a busy Web site: findings and implications

In this paper, we study the dynamics of the MSNBC news site, one of the busiest Web sites in the Internet today. Unlike many other efforts that have analyzed client accesses as seen by proxies, we focus on the server end. We analyze the dynamics of both the server content and client accesses made to the server. The former considers the content creation and modification process while the latter considers page popularity and locality in client accesses. Some of our key results are: (a) files tend to change little when they are modified, (b) a small set of files tends to get modified repeatedly, (c) file popularity follows a Zipf-like distribution with a parameter &agr that is much larger than reported in previous, proxy-based studies, and (d) there is significant temporal stability in file popularity but not much stability in the domains from which clients access the popular content. We discuss the implications of these findings for techniques such as Web caching (including cache consistency algorithms), and prefetching or server-based ``push'' of Web content.

[1]  Virgílio A. F. Almeida,et al.  Characterizing reference locality in the WWW , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[2]  Hiroshi Tsuji,et al.  Memory-Based Architecture for Distributed WWW Caching Proxy , 1998, Comput. Networks.

[3]  Joseph D. Touch,et al.  LSAM Proxy Cache: A Multicast Distributed Virtual Cache , 1998, Comput. Networks.

[4]  Jeffrey C. Mogul,et al.  Using predictive prefetching to improve World Wide Web latency , 1996, CCRV.

[5]  Amin Vahdat,et al.  Active Names: flexible location and transport of wide-area resources , 1999, Proceedings DARPA Active Networks Conference and Exposition.

[6]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[7]  Scott Shenker,et al.  A scalable Web cache consistency architecture , 1999, SIGCOMM '99.

[8]  Walter F. Tichy,et al.  Delta algorithms: an empirical analysis , 1998, TSEM.

[9]  G. Voelker,et al.  On the scale and performance of cooperative Web proxy caching , 2000, OPSR.

[10]  Chengjie Liu,et al.  Maintaining Strong Cache Consistency in the World Wide Web , 1998, IEEE Trans. Computers.

[11]  William LeFebvre,et al.  Rapid Reverse DNS Lookups for Web Servers , 1999, USENIX Symposium on Internet Technologies and Systems.

[12]  Anja Feldmann,et al.  Potential benefits of delta encoding and data compression for HTTP , 1997, SIGCOMM '97.

[13]  Steven Glassman,et al.  A Caching Relay for the World Wide Web , 1994, Comput. Networks ISDN Syst..

[14]  V. Rich Personal communication , 1989, Nature.

[15]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[16]  Margo I. Seltzer,et al.  World Wide Web Cache Consistency , 1996, USENIX Annual Technical Conference.

[17]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[18]  Edith Cohen,et al.  Improving end-to-end performance of the Web using server volumes and proxy filters , 1998, SIGCOMM '98.

[19]  Wei Lin,et al.  Web prefetching between low-bandwidth clients and proxies: potential and performance , 1999, SIGMETRICS '99.

[20]  G. Voelker,et al.  On the scale and performance of cooperative Web proxy caching , 1999, SOSP.

[21]  Lili Qiu,et al.  The content and access dynamics of a busy Web server (poster) , 2000, SIGMETRICS.

[22]  Anja Feldmann,et al.  Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.

[23]  Alec Wolman,et al.  Organization-Based Analysis of Web-Object Sharing and Caching , 1999, USENIX Symposium on Internet Technologies and Systems.

[24]  Eric A. Brewer,et al.  System Design Issues for Internet Middleware Services: Deductions from a Large Client Trace , 1997, USENIX Symposium on Internet Technologies and Systems.

[25]  Jeffrey C. Mogul,et al.  Network Behavior of a Busy Web Server and its Clients , 1999 .