An adaptive bloom filter cache partitioning scheme for multicore architectures

This paper investigates the problem of partitioning the last-level shared cache of multicore architectures. Contention for such a shared resource has been shown to severely degrade performance when running multiple applications. As architectures incorporate more cores, multiple application workloads become increasingly attractive, further exacerbating contention at the last-level cache. Today, cache replacement policies, extensively studied for uniprocessor systems, are being employed within new multicore architectures with little, if any, adaptation. However the parameters in these new systems are likely to be different. The least recently used (LRU) policy, for example, which is widely accepted as the best replacement policy in uniprocessor caches, often results in poor resource sharing in a multicore system, signalling the importance of reevaluating the effectiveness of these policies in the new architectures. This paper proposes adaptive bloom filter cache partitioning (ABFCP), a low-cost, dynamic cache partitioning mechanism capable of better resource sharing at the last-level cache than LRU, improving the performance of an eight-core system on average by 5.92% over the LRU policy. Moreover, the proposed scheme provides the equivalent performance benefits that could be gained from almost a 50% increase in the last-level cache and shows increasing benefit as the number of cores rises.

[1]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[2]  Vivek Sarkar,et al.  The Jikes Research Virtual Machine project: Building an open-source research community , 2005, IBM Syst. J..

[3]  Won-Taek Lim,et al.  Effective Management of DRAM Bandwidth in Multicore Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[4]  Srinivas Devadas,et al.  Dynamic Cache Partitioning via Columnization , 2000, DAC 2000.

[5]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[6]  Haoqiang Jin,et al.  Implementation of the NAS Parallel Benchmarks in Java , 2000 .

[7]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[8]  Li Zhao,et al.  CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[9]  Per Stenström,et al.  A Cache-Partitioning Aware Replacement Policy for Chip Multiprocessors , 2006, HiPC.

[10]  Avi Mendelson,et al.  CMP Implementation in Systems Based on the Intel Core Duo Processor , 2006 .

[11]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[12]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[13]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[14]  Rohit Bhatia,et al.  Montecito: a dual-core, dual-thread Itanium processor , 2005, IEEE Micro.

[15]  L.A. Smith,et al.  A Parallel Java Grande Benchmark Suite , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[16]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.