Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism

Die-stacked DRAM has been proposed for use as a large, high-bandwidth, last-level cache with hundreds or thousands of megabytes of capacity. Not all workloads (or phases) can productively utilize this much cache space, however. Unfortunately, the unused (or under-used) cache continues to consume power due to leakage in the peripheral circuitry and periodic DRAM refresh. Dynamically adjusting the available DRAM cache capacity could largely eliminate this energy overhead. However, the current proposed DRAM cache organization introduces new challenges for dynamic cache resizing. The organization diers from a conventional SRAM cache organization because it places entire cache sets and their tags within a single bank to reduce on-chip area and power overhead. Hence, resizing a DRAM cache requires remapping sets from the powered-down banks to active banks. In this paper, we propose CRUNCH (Cache Resizing Using Native Consistent Hashing), a hardware data remapping scheme inspired by consistent hashing, an algorithm originally proposed to uniformly and dynamically distribute Internet trac across a changing population of web servers. CRUNCH provides a load-balanced remapping of data from the powered-down banks alone to the active banks, without requiring sets from all banks to be remapped, unlike naive schemes to achieve load balancing. CRUNCH remaps only sets from the powered-down banks, so it achieves this load balancing with low bank power-up/down transition latencies. CRUNCH’s combination of good load balancing and low transition latencies provides a substrate to enable ecient DRAM cache resizing.

[1]  Margaret Martonosi,et al.  Let caches decay: reducing leakage energy via exploitation of cache generational behavior , 2002, TOCS.

[2]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[3]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[4]  Vikas Agarwal,et al.  Static energy reduction techniques for microprocessor caches , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[5]  Alon Naveh,et al.  Power and Thermal Management in the Intel Core Duo Processor , 2006 .

[6]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[7]  Mahmut T. Kandemir,et al.  VL-CDRAM: variable line sized cached DRAMs , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[8]  Per Stenström,et al.  Improving power efficiency of D-NUCA caches , 2007, CARN.

[9]  Abraham B. Bergman,et al.  A Comprehensive Approach , 1989, Clinical pediatrics.

[10]  Mahmut T. Kandemir,et al.  DRAM energy management using software and hardware directed power mode control , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[11]  Bruce Jacob,et al.  Fine-Grained Activation for Power Reduction in DRAM , 2010, IEEE Micro.

[12]  Mark D. Hill,et al.  Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Eric Rotenberg,et al.  Adaptive mode control: a static-power-efficient cache design , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[14]  Mor Harchol-Balter,et al.  Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[16]  Young-Hyun Jun,et al.  A 1.2V 12.8GB/s 2Gb mobile Wide-I/O DRAM with 4×128 I/Os using TSV-based stacking , 2011, 2011 IEEE International Solid-State Circuits Conference.

[17]  Ramon Canal,et al.  Using Coherence Information and Decay Techniques to Optimize L2 Cache Leakage in CMPs , 2009, 2009 International Conference on Parallel Processing.

[18]  Kaushik Roy,et al.  An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[19]  Norman P. Jouppi,et al.  Rethinking DRAM design and organization for energy-constrained multi-cores , 2010, ISCA.

[20]  A. Snavely,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[21]  Babak Falsafi,et al.  Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[22]  Frank Vahid,et al.  A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[23]  Yan Solihin,et al.  CHOP: Adaptive filter-based DRAM caching for CMP server platforms , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[24]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[25]  Hsien-Hsin S. Lee,et al.  Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[26]  Yuan Xie,et al.  Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[27]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[28]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[29]  Kaushik Roy,et al.  Reducing leakage in a high-performance deep-submicron instruction cache , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[30]  Onur Mutlu,et al.  Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management , 2012, IEEE Computer Architecture Letters.

[31]  Mark D. Hill,et al.  Virtual hierarchies to support server consolidation , 2007, ISCA '07.

[32]  Michael F. P. O'Boyle,et al.  IATAC: a smart predictor to turn-off L2 cache lines , 2005, TACO.

[33]  Calvin Lin,et al.  A comprehensive approach to DRAM power management , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[34]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).