Estimating Application Cache Requirement for Provisioning Caches in Virtualized Systems

Miss rate curves (MRCs) are a fundamental concept in determining the impact of caches on an application's performance. In our research, we use MRCs to provision caches for applications in a consolidated environment. Current techniques for building MRCs at the CPU caches level require changes to the applications and are restricted to a few processor architectures [7], [22]. In this work, we investigate two techniques to partition shared L2 and L3 caches in a server and build MRCs for the VMs. These techniques make different trade-offs across accuracy, flexibility, and intrusiveness dimensions. The first technique is based on operating system (OS) page coloring and does not require change in commodity hardware or application. We improve upon existing page-coloring based approaches by identifying and overcoming a subtle but real problem of unequal associative cache sets loading to implement accurate cache allocation. Our second technique called Cache Grabber is even less intrusive and requires no changes in hardware, OS, or application. We present a comprehensive evaluation of the relative merits of these and other techniques to estimate MRCs. Our evaluation study enables a data center administrator to select the technique most suitable to his (her) specific data center to provision caches for consolidated applications.

[1]  Peter J. Denning,et al.  Working Sets Past and Present , 1980, IEEE Transactions on Software Engineering.

[2]  Zhao Zhang,et al.  Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[3]  Sandhya Dwarkadas,et al.  Partitioning Multi-Threaded Processors with a Large Number of Threads , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[4]  David A. Wood,et al.  Implementing stack simulation for highly-associative memories , 1991, SIGMETRICS '91.

[5]  Akshat Verma,et al.  WattApp: an application aware power meter for shared data centers , 2010, ICAC '10.

[6]  Akshat Verma,et al.  Generalized ERSS tree model: Revisiting working sets , 2010, Perform. Evaluation.

[7]  Janak H. Patel,et al.  Compiler Directed Memory Management Policy For Numerical Programs , 1985, SOSP.

[8]  Akshat Verma,et al.  Power-aware dynamic placement of HPC applications , 2008, ICS '08.

[9]  Sang Lyul Min,et al.  A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references , 2000, OSDI.

[10]  Sanjeev Kumar,et al.  Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.

[11]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[12]  Per Stenström,et al.  An analytical model of the working-set sizes in decision-support systems , 2000, SIGMETRICS '00.

[13]  Anoop Gupta,et al.  Working sets, cache sizes, and node granularity issues for large-scale multiprocessors , 1993, ISCA '93.

[14]  Michael Stumm,et al.  Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[15]  Vasanth Bala,et al.  Dynamo: a transparent dynamic optimization system , 2000, SIGP.

[16]  Alan P. Batson,et al.  Measurements of major locality phases in symbolic reference strings , 1976, SIGMETRICS '76.

[17]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[18]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[19]  Rajeev Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.

[20]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[21]  James E. Smith,et al.  Managing multi-configuration hardware via dynamic working set analysis , 2002, ISCA.

[22]  Daniel P. Siewiorek,et al.  Practical solutions for QoS-based resource allocation problems , 1998, Proceedings 19th IEEE Real-Time Systems Symposium (Cat. No.98CB36279).

[23]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.