Performance Profiling of Cache Systems at Scale

Large scale in-memory object caches such as memcached are widely used to accelerate popular web sites and to reduce the burden on backend databases. Operation and development teams tuning a cache tier would benefit from knowing answers to questions such as “how much total memory should be allocated to the cache tier?” and “what is the minimum cache size for a given hit rate?” We propose a new lightweight online profiler, MIMIR, that hooks into the replacement policy of each cache server and periodically produces histograms of the overall cache hit rate as a function of memory size. It predicts smaller cache sizes with 99% accuracy on average at high performance. In order to predict the hit rate for larger cache sizes than the current allocation, the metadata for some evicted keys must be available. Keeping track of the metadata for all evicted keys is memory expensive and under intensive workloads will fill up the disk space quickly. We propose a new, fast and memory efficient method for storing a specific amount of evicted metadata with automatic flushing using Counting Filters, an extension of Bloom Filters to support removals. This method predicts the hit rate of a larger cache with 95% accuracy on average. Experiments on the profiler within memcached showed that dynamic hit rate histograms are produced with relatively low drop in throughput. Thus our evaluation suggests that online cache profiling can be a practical tool for improving provisioning of large caches. Arangursvoktun a storum flýtiminniskerfum Trausti Saemundsson Mai 2014

[1]  Dennis Shasha,et al.  2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm , 1994, VLDB.

[2]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[3]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[4]  Yingwei Luo,et al.  Efficient LRU-Based Working Set Size Tracking , 2011 .

[5]  Robbert van Renesse,et al.  An analysis of Facebook photo caching , 2013, SOSP.

[6]  Anees Shaikh,et al.  Performance Isolation and Fairness for Multi-Tenant Cloud Storage , 2012, OSDI.

[7]  Yutao Zhong,et al.  Predicting whole-program locality through reuse distance analysis , 2003, PLDI.

[8]  L. Mummert,et al.  Overcoming the network bottleneck in mobile computing , 1994, Workshop on Mobile Computing Systems and Applications.

[9]  Michael Stumm,et al.  Path: page access tracking to improve memory management , 2007, ISMM '07.

[10]  Nimrod Megiddo,et al.  ARC: A Self-Tuning, Low Overhead Replacement Cache , 2003, FAST.

[11]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[12]  Vincent J. Kruskal,et al.  LRU Stack Processing , 1975, IBM J. Res. Dev..

[13]  Laszlo A. Belady,et al.  An anomaly in space-time characteristics of certain programs running in a paging machine , 1969, CACM.

[14]  Dharmendra S. Modha,et al.  CAR: Clock with Adaptive Replacement , 2004, FAST.

[15]  Gerhard Weikum,et al.  The LRU-K page replacement algorithm for database disk buffering , 1993, SIGMOD Conference.

[16]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[17]  Bin Fan,et al.  MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing , 2013, NSDI.

[18]  Sang Lyul Min,et al.  A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references , 2000, OSDI.

[19]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[20]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[21]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[22]  Eugene Ciurana,et al.  Google App Engine , 2009 .

[23]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[24]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[25]  Song Jiang,et al.  LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance , 2002, SIGMETRICS '02.

[26]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[27]  Ymir Vigfusson,et al.  Design and implementation of caching services in the cloud , 2011, IBM J. Res. Dev..

[28]  Song Jiang,et al.  CLOCK-Pro: An Effective Improvement of the CLOCK Replacement , 2005, USENIX ATC, General Track.

[29]  Frank Olken,et al.  Efficient methods for calculating the success function of fixed space replacement policies , 1983, Perform. Evaluation.

[30]  Emery D. Berger,et al.  CRAMM: virtual memory support for garbage-collected applications , 2006, OSDI '06.