论文信息 - DR-SNUCA: An energy-scalable dynamically partitioned cache

DR-SNUCA: An energy-scalable dynamically partitioned cache

Multicore processors have become ubiquitous across many domains, such as datacenters and smartphones. As the number of processing elements increases within these processors, so does the pressure to share the critical on-chip cache resources, but this must be done energy-efficiently and without sacrificing resource guarantees. We propose a scalable dynamic cache-partitioning scheme, DR-SNUCA, which provides an energy-efficient way to reduce resource interference over caches shared among many processing elements. Our results show that DR-SNUCA reduces system energy consumption by 16.3% compared to associatively partitioned caches, such as DNUCA.

[1] Zhao Zhang,et al. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[2] A. Jaleel. Memory Characterization of Workloads Using Instrumentation-Driven Simulation A Pin-based Memory Characterization of the SPEC CPU 2000 and SPEC CPU 2006 Benchmark Suites , 2022 .

[3] Vol,et al. Transactions on High-Performance Embedded Architectures and Compilers II , 2009, Trans. HiPEAC.

[4] Sally A. McKee,et al. Data Cache Techniques to Save Power and Deliver High Performance in Embedded Systems , 2009, Trans. High Perform. Embed. Archit. Compil..

[5] Jack Sampson,et al. TimeCube: A manycore embedded processor with interference-agnostic progress tracking , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[6] Richard E. Kessler,et al. Inexpensive Implementations Of Set-Associativity , 1989, The 16th Annual International Symposium on Computer Architecture.

[7] Jie Liu,et al. Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines , 2011, SoCC.

[8] Anant Agarwal,et al. Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[9] Anca Mariana Molnos,et al. Compositional, Dynamic Cache Management for Embedded Chip Multiprocessors , 2008, 2008 Design, Automation and Test in Europe.

[10] James E. Smith,et al. Virtual private caches , 2007, ISCA '07.

[11] Jörg Henkel,et al. Dynamic cache management in multi-core architectures through run-time adaptation , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[12] Mahmut T. Kandemir,et al. Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[13] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[14] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[15] Yale N. Patt,et al. The V-Way cache: demand-based associativity via global replacement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[16] Henry Hoffmann,et al. Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[17] Jason Cong,et al. An energy-efficient adaptive hybrid cache , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[18] Onur Mutlu,et al. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19] Mahmut T. Kandemir,et al. Adaptive set pinning: managing shared caches in chip multiprocessors , 2008, ASPLOS.

[20] Stijn Eyerman,et al. Per-thread cycle accounting in SMT processors , 2009, ASPLOS.

[21] Yan Solihin,et al. A Framework for Providing Quality of Service in Chip Multi-Processors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[22] Kaushik Roy,et al. Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, MICRO.

[23] Frank Vahid,et al. A Self-Tuning Configurable Cache , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[24] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[25] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26] Tao Zhang,et al. MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[27] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[28] Jaehyuk Huh,et al. A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.

[29] Matt T. Yourst. PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[30] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[31] Aamer Jaleel,et al. DRAMsim: a memory system simulator , 2005, CARN.

[32] Byeong Kil Lee,et al. Hybrid-way Cache for Mobile Processors , 2011, 2011 Eighth International Conference on Information Technology: New Generations.

[33] James E. Smith,et al. Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[34] Srihari Makineni,et al. Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).