Refrint: Intelligent refresh to minimize power in on-chip multiprocessor cache hierarchies

As manycores use dynamic energy ever more efficiently, static power consumption becomes a major concern. In particular, in a large manycore running at a low voltage, leakage in on-chip memory modules contributes substantially to the chip's power draw. This is unfortunate, given that, intuitively, the large multi-level cache hierarchy of a manycore is likely to contain a lot of useless data. An effective way to reduce this problem is to use a low-leakage technology such as embedded DRAM (eDRAM). However, eDRAM requires refresh. In this paper, we examine the opportunity of minimizing on-chip memory power further by intelligently refreshing on-chip eDRAM. We present Refrint, a simple approach to perform fine-grained, intelligent refresh of on-chip eDRAM multiprocessor cache hierarchies. We introduce the Refrint algorithms and microarchitecture. We evaluate Refrint in a simulated manycore running 16-threaded parallel applications. We show that an eDRAM-based memory hierarchy with Refrint consumes only 30% of the energy of a conventional SRAM-based memory hierarchy, and induces a slowdown of only 6%. In contrast, an eDRAM-based memory hierarchy without Refrint consumes 56% of the energy of the conventional memory hierarchy, inducing a slowdown of 25%.

[1]  Balaram Sinharoy,et al.  POWER7™, a Highly Parallel, Scalable Multi-Core High End Server Processor , 2011, IEEE Journal of Solid-State Circuits.

[2]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[4]  Kaushik Roy,et al.  An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[5]  Pedro López,et al.  An hybrid eDRAM/SRAM macrocell to implement first-level data caches , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Jung Ho Ahn,et al.  A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.

[7]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[8]  Eric Rotenberg,et al.  Adaptive mode control: a static-power-efficient cache design , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[9]  Chris H. Kim,et al.  A 700MHz 2T1C embedded DRAM macro in a generic logic process with no boosted supplies , 2011, 2011 IEEE International Solid-State Circuits Conference.

[10]  Gu-Yeon Wei,et al.  Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[11]  Lizy Kurian John,et al.  ESKIMO - energy savings using semantic knowledge of inconsequential memory occupancy for DRAM subsystem , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Philip G. Emma,et al.  Rethinking Refresh: Increasing Availability and Reducing Power in DRAM for Cache Applications , 2008, IEEE Micro.

[13]  Richard E. Matick,et al.  Logic-based eDRAM: Origins and rationale for use , 2005, IBM J. Res. Dev..

[14]  David R. Kaeli,et al.  Exploiting temporal locality in drowsy cache policies , 2005, CF '05.

[15]  Margaret Martonosi,et al.  Managing leakage for transient data: decay and quasi-static 4T memory cells , 2002, ISLPED '02.

[16]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[17]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[18]  Eric Rotenberg,et al.  Retention-aware placement in DRAM (RAPID): software methods for quasi-non-volatile DRAM , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[19]  Richard E. Matick,et al.  A 500 MHz Random Cycle, 1.5 ns Latency, SOI Embedded DRAM Macro Featuring a Three-Transistor Micro Sense Amplifier , 2008, IEEE Journal of Solid-State Circuits.

[20]  John E. Barth,et al.  Embedded DRAM: Technology platform for the Blue Gene/L chip , 2005, IBM J. Res. Dev..

[21]  Wei Wu,et al.  Reducing cache power with low-cost, multi-bit error-correcting codes , 2010, ISCA.

[22]  Jose Renau,et al.  Effective Optimistic-Checker Tandem Core Design through Architectural Pruning , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[23]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[24]  Babak Falsafi,et al.  Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[25]  Hsien-Hsin S. Lee,et al.  Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[26]  David Blaauw,et al.  Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits , 2010, Proceedings of the IEEE.