Set Utilization Based Dynamic Shared Cache Partitioning

As the number of processors sharing a cache increases, conflict misses due to interference amongst competing processes have an increasing impact on the individual performance of processes. Cache partitioning is a method of allocating a cache between concurrently executing processes in order to counteract the effects of inter-process conflicts. However, cache partitioning methods commonly divide a shared cache into private partitions dedicated to a single processor, which can lead to underutilized portions of the cache when set accesses are non-uniform. Our proposed method compliments these cache partitioning algorithms by creating an additional shared partition able to be shared amongst all processors. Underutilized areas of the cache are identified by a monitoring circuit and used for the shared partition. Detection of underutilization is based on the number of unique set accesses for a given allocated way. For a 16-way set associative cache, the implementation of our method requires 64 bytes of storage overhead per core in addition to that needed for the method that determines the sizes of the private partitions. For the tested system, our method is able to improve performance over the traditional LRU policy for a number of selected benchmark sets by an average of 1.4% and up to 13.3% for a two core system and an average of 1.4% and up to 7.8% for a four core system, and is able to improve the performance of a conventional cache partitioning method (Utility-Based Cache Partitioning) by an average of 0.1% and up to 0.5% for both a two and four core systems.

[1]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[2]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[3]  Per Stenström,et al.  An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[4]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  Yan Solihin,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[6]  Jichuan Chang,et al.  Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[7]  Jichuan Chang,et al.  Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.

[8]  Gabriel H. Loh,et al.  PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[9]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[10]  Mateo Valero,et al.  Eliminating cache conflict misses through XOR-based placement functions , 1997, ICS '97.

[11]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[12]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.