Web caching and Zipf-like distributions: evidence and implications

This paper addresses two unresolved issues about Web caching. The first issue is whether Web requests from a fixed user community are distributed according to Zipf's (1929) law. The second issue relates to a number of studies on the characteristics of Web proxy traces, which have shown that the hit-ratios and temporal locality of the traces exhibit certain asymptotic properties that are uniform across the different sets of the traces. In particular, the question is whether these properties are inherent to Web accesses or whether they are simply an artifact of the traces. An answer to these unresolved issues will facilitate both Web cache resource planning and cache hierarchy design. We show that the answers to the two questions are related. We first investigate the page request distribution seen by Web proxy caches using traces from a variety of sources. We find that the distribution does not follow Zipf's law precisely, but instead follows a Zipf-like distribution with the exponent varying from trace to trace. Furthermore, we find that there is only (i) a weak correlation between the access frequency of a Web page and its size and (ii) a weak correlation between access frequency and its rate of change. We then consider a simple model where the Web accesses are independent and the reference probability of the documents follows a Zipf-like distribution. We find that the model yields asymptotic behaviour that are consistent with the experimental observations, suggesting that the various observed properties of hit-ratios and temporal locality are indeed inherent to Web accesses observed by proxies. Finally, we revisit Web cache replacement algorithms and show that the algorithm that is suggested by this simple model performs best on real trace data. The results indicate that while page requests do indeed reveal short-term correlations and other structures, a simple model for an independent request stream following a Zipf-like distribution is sufficient to capture certain asymptotic properties observed at Web proxies.

[1]  G. Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[2]  Peter J. Denning,et al.  Operating Systems Theory , 1973 .

[3]  Peter J. Denning,et al.  Working Sets Past and Present , 1980, IEEE Transactions on Software Engineering.

[4]  V. Rich Personal communication , 1989, Nature.

[5]  Steven Glassman,et al.  A Caching Relay for the World Wide Web , 1994, Comput. Networks ISDN Syst..

[6]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[7]  Virgílio A. F. Almeida,et al.  Characterizing reference locality in the WWW , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[8]  Margo I. Seltzer,et al.  World Wide Web Cache Consistency , 1996, USENIX Annual Technical Conference.

[9]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[10]  Michael J. Feeley,et al.  The Measured Access Characteristics of World-Wide-Web Client Proxy Caches , 1997, USENIX Symposium on Internet Technologies and Systems.

[11]  Eric A. Brewer,et al.  System Design Issues for Internet Middleware Services: Deductions from a Large Client Trace , 1997, USENIX Symposium on Internet Technologies and Systems.

[12]  Darrell D. E. Long,et al.  Exploring the Bounds of Web Latency Reduction from Caching and Prefetching , 1997, USENIX Symposium on Internet Technologies and Systems.

[13]  Virgílio A. F. Almeida,et al.  Analyzing the behavior of a proxy server in light of regional and cultural issues , 1998 .

[14]  Hiroshi Tsuji,et al.  Memory-Based Architecture for Distributed WWW Caching Proxy , 1998, Comput. Networks.

[15]  Luigi Rizzo,et al.  Replacement policies for a proxy cache , 2000, TNET.