The Measured Access Characteristics of World-Wide-Web Client Proxy Caches

The growing popularity of the World Wide Web is placing tremendous demands on the Internet. A key strategy for scaling the Internet to meet these increasing demands is to cache data near clients and thus improve access latency and reduce network and server load. Unfortunately, research in this area has been hampered by a poor understanding of the locality and sharing characteristics of Web-client accesses. The recent popularity of Web proxy servers provides a unique opportunity to improve this understanding, because a small number of proxy servers see accesses from thousands of clients. This paper presents an analysis of access traces collected from seven proxy servers deployed in various locations throughout the Internet. The traces record a total of 47.4 million requests made by 23,700 clients over a twenty-one day period. We use a combination of static analysis and trace-driven cache simulation to characterize the locality and sharing properties of these accesses. Our analysis shows that a 2- to 10-GB second-level cache yields hit rates between 24% and 45% with 85% of these hits due to sharing among different clients. Caches with more clients exhibit more sharing and thus higher hit rates. Between 2% and 7% of accesses are consistency misses to unmodified objects, using the Squid and CERN proxy cache coherence protocols. Sharing is bimodal. Requests for shared objects are divided evenly between objects that are narrowly shared and those that are shared by many clients; widely shared objects also tend to be shared by clients from unrelated traces.

[1]  Mary Baker,et al.  Measurements of a distributed file system , 1991, SOSP '91.

[2]  Richard B. Bunt,et al.  Disk cache replacement policies for network fileservers , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[3]  James E. Pitkow,et al.  Yet Robust Caching Algorithm Based on Dynamic Access Patterns , 1994, WWW Spring 1994.

[4]  Steven Glassman,et al.  A Caching Relay for the World Wide Web , 1994, Comput. Networks ISDN Syst..

[5]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[6]  Margo I. Seltzer,et al.  The case for geographical push-caching , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[7]  Duane Wessels,et al.  Intelligent Caching for World-Wide Web Objects , 1995 .

[8]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[9]  Peter B. Danzig,et al.  A Hierarchical Internet Object Cache , 1996, USENIX ATC.

[10]  Jacob R. Lorch,et al.  Making World Wide Web Caching Servers Cooperate , 1996, World Wide Web J..

[11]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[12]  John H. Hartman,et al.  Efficient cooperative caching using hints , 1996, OSDI '96.

[13]  Syam Gadde,et al.  Reduce, reuse, recycle: an approach to building large Internet caches , 1997, Proceedings. The Sixth Workshop on Hot Topics in Operating Systems (Cat. No.97TB100133).

[14]  Eric A. Brewer,et al.  System Design Issues for Internet Middleware Services: Deductions from a Large Client Trace , 1997, USENIX Symposium on Internet Technologies and Systems.

[15]  Syam Gadde,et al.  Directory Structures for Scalable Internet Caches , 1997 .

[16]  Carlos Maltzahn,et al.  Performance issues of enterprise level web proxies , 1997, SIGMETRICS '97.

[17]  Balachander Krishnamurthy,et al.  Study of Piggyback Cache Validation for Proxy Caches in the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.