Approximating Hit Rate Curves using Streaming Algorithms

A hit rate curve is a function that maps cache size to the proportion of requests that can be served from the cache. (The caching policy and sequence of requests are assumed to be fixed.) Hit rate curves have been studied for decades in the operating system, database and computer architecture communities. They are useful tools for designing appropriate cache sizes, dynamically allocating memory between competing caches, and for summarizing locality properties of the request sequence. In this paper we focus on the widely-used LRU caching policy. Computing hit rate curves is very efficient from a runtime standpoint, but existing algorithms are not efficient in their space usage. For a stream of m requests for n cacheable objects, all existing algorithms that provably compute the hit rate curve use space linear in n. In the context of modern storage systems, n can easily be in the billions or trillions, so the space usage of these algorithms makes them impractical. We present the first algorithm for provably approximating hit rate curves for the LRU policy with sublinear space. Our algorithm uses O( p^2 * log(n) * log^2(m) / epsilon^2 ) bits of space and approximates the hit rate curve at p uniformly-spaced points to within additive error epsilon. This is not far from optimal. Any single-pass algorithm with the same guarantees must use Omega(p^2 + epsilon^{-2} + log(n)) bits of space. Furthermore, our use of additive error is necessary. Any single-pass algorithm achieving multiplicative error requires Omega(n) bits of space.

[1]  Alexander A. Razborov,et al.  On the Distributional Complexity of Disjointness , 1992, Theor. Comput. Sci..

[2]  Rajeev Rastogi,et al.  Tracking set-expression cardinalities over continuous update streams , 2004, The VLDB Journal.

[3]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[4]  Andrew Warfield,et al.  Characterizing Storage Workloads with Counter Stacks , 2014, OSDI.

[5]  P. Flajolet,et al.  Loglog counting of large cardinalities , 2003 .

[6]  P. Sadayappan,et al.  PARDA: A Fast Parallel Reuse Distance Analysis Algorithm , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[7]  Hjörtur Björnsson,et al.  Dynamic performance profiling of cloud caches , 2013, SoCC.

[8]  Frank Olken,et al.  Efficient methods for calculating the success function of fixed space replacement policies , 1983, Perform. Evaluation.

[9]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[10]  Irfan Ahmad,et al.  Efficient MRC Construction with SHARDS , 2015, FAST.

[11]  David A. Padua,et al.  Calculating stack distances efficiently , 2002, MSP/ISMM.

[12]  David Eklov,et al.  StatStack: Efficient modeling of LRU caches , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[13]  Alan Jay Smith,et al.  Two Methods for the Efficient Analysis of Memory Address Trace Data , 1977, IEEE Transactions on Software Engineering.

[14]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[15]  Emery D. Berger,et al.  CRAMM: virtual memory support for garbage-collected applications , 2006, OSDI '06.

[16]  Amit Chakrabarti,et al.  An Optimal Lower Bound on the Communication Complexity of Gap-Hamming-Distance , 2012, SIAM J. Comput..

[17]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[18]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[19]  P. Flajolet,et al.  HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .

[20]  Andrea C. Arpaci-Dusseau,et al.  Geiger: monitoring the buffer cache in a virtual machine environment , 2006, ASPLOS XII.

[21]  Vincent J. Kruskal,et al.  LRU Stack Processing , 1975, IBM J. Res. Dev..

[22]  Jin Chen,et al.  Dynamic Resource Allocation for Database Servers Running on Virtual Storage , 2009, FAST.

[23]  John Turek,et al.  Optimal Partitioning of Cache Memory , 1992, IEEE Trans. Computers.

[24]  Chen Ding,et al.  Array regrouping and structure splitting using whole-program reference affinity , 2004, PLDI '04.

[25]  David P. Woodruff,et al.  An optimal algorithm for the distinct elements problem , 2010, PODS '10.

[26]  Sanjeev Kumar,et al.  Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.

[27]  Nimrod Megiddo,et al.  ARC: A Self-Tuning, Low Overhead Replacement Cache , 2003, FAST.

[28]  Eyal Kushilevitz,et al.  Communication Complexity , 1997, Adv. Comput..

[29]  R. Ostrovsky,et al.  Smooth Histograms for Sliding Windows , 2007, FOCS 2007.

[30]  Zachary Drudi,et al.  A streaming algorithms approach to approximating hit rate curves , 2014 .

[31]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[32]  Yutao Zhong,et al.  Predicting whole-program locality through reuse distance analysis , 2003, PLDI.