CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms

As multi-core architectures flourish in the marketplace, multi-application workload scenarios (such as server consolidation) are growing rapidly. When running multiple applications simultaneously on a platform, it has been shown that contention for shared platform resources such as last-level cache can severely degrade performance and quality of service (QoS). But today's platforms do not have the capability to monitor shared cache usage accurately and disambiguate its effects on the performance behavior of each individual application. In this paper, we investigate low-overhead mechanisms for fine-grain monitoring of the use of shared cache resources along three vectors: (a) occupancy - how much space is being used and by whom, (b) interference - how much contention is present and who is being affected and (c) sharing - how are threads cooperating. We propose the CacheScouts monitoring architecture consisting of novel tagging (software-guided monitoring IDs), and sampling mechanisms (set sampling) to achieve shared cache monitoring on per application basis at low overhead (<0.1%) and with very little loss of accuracy (<5%). We also present case studies to show how CacheScouts can be used by operating systems (OS) and virtual machine monitors (VMMs) for (a) characterizing execution profiles, (b) optimizing scheduling for performance management, (c) providing QoS and (d) metering for chargeback.

[1]  Ravi R. Iyer On modeling and analyzing cache hierarchies using CASPER , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[2]  Gil Neiger,et al.  Intel virtualization technology , 2005, Computer.

[3]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[4]  Christoforos E. Kozyrakis,et al.  From chaos to QoS: case studies in CMP resource management , 2007, CARN.

[5]  Won-Taek Lim,et al.  Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Shlomit S. Pinter,et al.  Data Sharing Conscious Scheduling for Multi-threaded Applications on SMP Machines , 2006, Euro-Par.

[7]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[8]  David A. Wood,et al.  A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.

[9]  Mahmut T. Kandemir,et al.  Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[10]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[11]  Srihari Makineni,et al.  Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[13]  Margo Seltzer,et al.  CASC : A Cache-Aware Scheduling Algorithm For Multithreaded Chip Multiprocessors , 2005 .

[14]  Josep Torrellas,et al.  Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary , 1993, SIGMETRICS '93.

[15]  Yan Solihin,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[16]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.