Asymptotic Optimality of the Static Frequency Caching in the Presence of Correlated Requests

Renewed interest in caching algorithms stems from their application to content distribution on the Web. When documents are of equal size and their requests are independent and equally distributed, it is well known that static algorithm that keeps the most frequently requested documents in the cache is optimal. However, there are no explicit caching algorithms that are provably optimal when the requests are statistically correlated. In this paper, we show, maybe somewhat surprisingly, that keeping the most frequently requested documents in the cache is still optimal for large cache sizes even if the requests are strongly correlated. We model the statistical dependency of requests using semi-Markov modulated processes that can capture strong statistical correlation, including the empirically observed long-range dependence in the Web access sequences. Although frequency algorithm and its practical version least-frequently-used policy is not commonly used in practice due to their complexity and static nature, our result provides a benchmark for evaluating the popular heuristic schemes. In particular, an important corollary of our main theorem and recent result from [9] is that the widely used least-recently-used heuristic is asymptotically near-optimal under the semi-Markov modulated requests and generalized Zipf's law document frequencies.

[1]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[2]  Armand M. Makowski,et al.  Optimal replacement policies for nonuniform cache objects with optional eviction , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[3]  Shudong Jin,et al.  GreedyDual* Web Caching Algorithm , 2000 .

[4]  Azer Bestavros,et al.  Sources and characteristics of Web temporal locality , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).

[5]  Predrag R. Jelenkovic,et al.  The persistent-access-caching algorithm , 2008, Random Struct. Algorithms.

[6]  Peter J. Denning,et al.  Operating Systems Theory , 1973 .

[7]  Marc Abrams,et al.  Proxy Caching That Estimates Page Load Delays , 1997, Comput. Networks.

[8]  Predrag R. Jelenkovic,et al.  Least-recently-used caching with dependent requests , 2004, Theor. Comput. Sci..

[9]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[10]  Virgílio A. F. Almeida,et al.  Characterizing reference locality in the WWW , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[11]  François Baccelli,et al.  Elements Of Queueing Theory , 1994 .

[12]  P. Jelenkovic,et al.  Near optimality of the discrete persistent access caching algorithm , 2004 .

[13]  Henry D. Shapiro,et al.  Algorithms from P to NP (vol. 1): design and efficiency , 1991 .

[14]  Predrag R. Jelenkovic,et al.  Optimizing LRU Caching for Variable Document Sizes , 2004, Combinatorics, Probability and Computing.

[15]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[16]  Predrag R. Jelenkovic,et al.  Asymptotic insensitivity of least-recently-used caching to statistical dependency , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).