Examining the Cacheability of User-Requested Web Resources

This paper continues work to monitor and better understand the characteristics of resource changes at servers and how these servers report meta data about the resources. It extends our own previous work, which studied selected resources from popular web sites, to an actual trace of user requests. This approach allows study of a set of resources that users are known to be retrieving. The results show that there is potential to reuse more cached resources than is currently being realized due to inaccurate and nonex-istent directives. For example, over 33% of HTML resources in the study do not change, but contain no last modiication time or other cache directive in the response, so these resources cannot be cached and validated with the origin server. In addition, embedded images are often reused, even in pages that change frequently. This result both points to the need to cache such images and to discard them when they are no longer included as part of any page. The last result of this work is that the inclusion of a cookie as part of a request does not make the response uncacheable. In most cases we obtained identical responses from two requests for the same URL with diierent cookies. These results imply such responses can be cached and used for validation if other cache directives allow for it. In cases where the responses are not the same, they often diier only in the ad image contained.

[1]  Anja Feldmann,et al.  Web proxy caching: the devil is in the details , 1998, PERV.

[2]  Mun Choon Chan,et al.  Cache-based compaction: a new technique for optimizing Web transfer , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[3]  Balachander Krishnamurthy,et al.  Piggyback Server Invalidation for Proxy Cache Coherency , 1998, Comput. Networks.

[4]  Fred Douglis,et al.  HPP: HTML Macro-Preprocessing to Support Dynamic Document Caching , 1997, USENIX Symposium on Internet Technologies and Systems.

[5]  Eric A. Brewer,et al.  System Design Issues for Internet Middleware Services: Deductions from a Large Client Trace , 1997, USENIX Symposium on Internet Technologies and Systems.

[6]  Andrei Z. Broder,et al.  Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content , 1999, Comput. Networks.

[7]  John Dilley Hewlett-Packard Web Server Workload Characterization , 1996 .

[8]  Margo I. Seltzer,et al.  Web Facts and Fantasy , 1997, USENIX Symposium on Internet Technologies and Systems.

[9]  Michael J. Feeley,et al.  The Measured Access Characteristics of World-Wide-Web Client Proxy Caches , 1997, USENIX Symposium on Internet Technologies and Systems.

[10]  Craig E. Wills,et al.  Towards a Better Understanding of Web Resources and Server Responses for Improved Caching , 1999, Comput. Networks.

[11]  James E. Pitkow,et al.  Summary of WWW characterizations , 1998, World Wide Web.

[12]  Anja Feldmann,et al.  Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.

[13]  Darrell D. E. Long,et al.  Exploring the Bounds of Web Latency Reduction from Caching and Prefetching , 1997, USENIX Symposium on Internet Technologies and Systems.

[14]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[15]  Anja Feldmann,et al.  Performance of Web proxy caching in heterogeneous bandwidth environments , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[16]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[17]  Anja Feldmann,et al.  Potential benefits of delta encoding and data compression for HTTP , 1997, SIGCOMM '97.