论文信息 - Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms

Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms

We present a methodology for off-chip memory bandwidth minimization through application-driven L2 cache partitioning in multi-core systems. A major challenge with multi-core system design is the widening gap between the memory demand generated by the processor cores and the limited off-chip memory bandwidth and memory service speed. This severely restricts the number of cores that can be integrated into a multi-core system and the parallelism that can be actually achieved and efficiently exploited for not only memory demanding applications, but also for workloads consisting of many tasks utilizing a large number of cores and thus exceeding the available off-chip bandwidth. Last level shared cache partitioning has been shown to be a promising technique to enhance cache utilization and reduce missrates. While most cache partitioning techniques focus on cache miss rates, our work takes a different approach in which tasks' memory bandwidth requirements are taken into account when identifying a cache partitioning for multi-programmed and/or multithreaded workloads. Cache resources are allocated with the objective that the overall system bandwidth requirement is minimized for the target workload. The key insight is that cache miss-rate information may severely misrepresent the actual bandwidth demand of the task, which ultimately determines the overall system performance and power consumption.

Chenjie Yu | Peter Petrov | Chenjie Yu | Peter Petrov

[1] Srihari Makineni,et al. Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[2] D. Burger,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[3] Frank Vahid,et al. A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[4] David H. Albonesi,et al. Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[5] Frank Vahid,et al. A Self-Tuning Configurable Cache , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[6] Jichuan Chang,et al. Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.

[7] David K. Tam,et al. Managing Shared L2 Caches on Multicore Systems in Software , 2007 .

[8] Babak Falsafi,et al. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[9] Mahmut T. Kandemir,et al. Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[10] Ronald G. Dreslinski,et al. The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[11] Brian Rogers,et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.

[12] James E. Smith,et al. Virtual private caches , 2007, ISCA '07.

[13] Won-Taek Lim,et al. Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[15] Zhao Zhang,et al. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[16] S. Kim,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..