Managing Shared L2 Caches on Multicore Systems in Software

Most of today’s multi-core processors feature shared L2 caches. A major problem faced by such architectures is cac he contention, where multiple cores compete for usage of the si ngle shared L2 cache. Uncontrolled sharing leads to scenarios wh ere one core evicts useful L2 cache content belonging to another core. To address this problem, we have implemented a softwar e mechanism in the operating system that allows for partitioning of the shared L2 cache by guiding the allocation of physical pages. This mechanism, which can also be applied to virtual machine monitors, provides isolation capabilities that lead to reduced contention. We show that this mechanism is effectiv e in reducing cache contention in multiprogrammed SPECcpu2000 and SPECjbb2000 workloads. Performance improvements of up to 17% were achieved without adversely affecting co-schedu led applications. In order to effectively size L2 cache partitions, a quantifiable metric is needed to properly predict performance as a functi on of L2 cache size. For page management, Miss Rate Curves (MRCs) have proven to be useful for this purpose. However, for L2 cac he sizing, we have found L2 MRCs to be inadequate and have found instruction retirement Stall Rate Curves (SRCs) to bemore effective, where the stalls are caused by memory latencies.

[1]  Michael J. Flynn,et al.  The effect of page allocation on caches , 1992, MICRO.

[2]  William L. Lynch,et al.  The Effect Of Page Allocation On Caches , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[3]  Brian N. Bershad,et al.  Avoiding conflict misses dynamically in large direct-mapped caches , 1994, ASPLOS VI.

[4]  Brad Calder,et al.  Reducing cache misses using hardware and software page placement , 1999, ICS '99.

[5]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[6]  Yan Solihin,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[7]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[8]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[9]  Michael Stumm,et al.  Online performance analysis by statistical sampling of microprocessor performance counters , 2005, ICS '05.

[10]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[11]  Erik Hagersten,et al.  Fast data-locality profiling of native execution , 2005, SIGMETRICS '05.

[12]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[13]  Sangyeun Cho,et al.  Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[14]  Yale N. Patt,et al.  Utility-Based Cache Partitioning , 2006 .

[15]  Yan Solihin,et al.  An analytical model for cache replacement policy performance , 2006, SIGMETRICS '06/Performance '06.

[16]  Michael Stumm,et al.  Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.

[17]  Sangyeun Cho,et al.  Better than the Two : Exceeding Private and Shared Caches via Two-Dimensional Page Coloring , 2007 .

[18]  Francisco J. Cazorla,et al.  Explaining Dynamic Cache Partitioning Speed Ups , 2007, IEEE Computer Architecture Letters.

[19]  Christoforos E. Kozyrakis,et al.  From chaos to QoS: case studies in CMP resource management , 2007, CARN.