论文信息 - Refrint: Intelligent refresh to minimize power in on-chip multiprocessor cache hierarchies

Refrint: Intelligent refresh to minimize power in on-chip multiprocessor cache hierarchies

As manycores use dynamic energy ever more efficiently, static power consumption becomes a major concern. In particular, in a large manycore running at a low voltage, leakage in on-chip memory modules contributes substantially to the chip's power draw. This is unfortunate, given that, intuitively, the large multi-level cache hierarchy of a manycore is likely to contain a lot of useless data. An effective way to reduce this problem is to use a low-leakage technology such as embedded DRAM (eDRAM). However, eDRAM requires refresh. In this paper, we examine the opportunity of minimizing on-chip memory power further by intelligently refreshing on-chip eDRAM. We present Refrint, a simple approach to perform fine-grained, intelligent refresh of on-chip eDRAM multiprocessor cache hierarchies. We introduce the Refrint algorithms and microarchitecture. We evaluate Refrint in a simulated manycore running 16-threaded parallel applications. We show that an eDRAM-based memory hierarchy with Refrint consumes only 30% of the energy of a conventional SRAM-based memory hierarchy, and induces a slowdown of only 6%. In contrast, an eDRAM-based memory hierarchy without Refrint consumes 56% of the energy of the conventional memory hierarchy, inducing a slowdown of 25%.

[1] Balaram Sinharoy,et al. POWER7™, a Highly Parallel, Scalable Multi-Core High End Server Processor , 2011, IEEE Journal of Solid-State Circuits.

[2] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[4] Kaushik Roy,et al. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[5] Pedro López,et al. An hybrid eDRAM/SRAM macrocell to implement first-level data caches , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6] Jung Ho Ahn,et al. A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.

[7] Kaushik Roy,et al. Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[8] Eric Rotenberg,et al. Adaptive mode control: a static-power-efficient cache design , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[9] Chris H. Kim,et al. A 700MHz 2T1C embedded DRAM macro in a generic logic process with no boosted supplies , 2011, 2011 IEEE International Solid-State Circuits Conference.

[10] Gu-Yeon Wei,et al. Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[11] Lizy Kurian John,et al. ESKIMO - energy savings using semantic knowledge of inconsequential memory occupancy for DRAM subsystem , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12] Philip G. Emma,et al. Rethinking Refresh: Increasing Availability and Reducing Power in DRAM for Cache Applications , 2008, IEEE Micro.

[13] Richard E. Matick,et al. Logic-based eDRAM: Origins and rationale for use , 2005, IBM J. Res. Dev..

[14] David R. Kaeli,et al. Exploiting temporal locality in drowsy cache policies , 2005, CF '05.

[15] Margaret Martonosi,et al. Managing leakage for transient data: decay and quasi-static 4T memory cells , 2002, ISLPED '02.

[16] Onur Mutlu,et al. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[17] David Blaauw,et al. Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[18] Eric Rotenberg,et al. Retention-aware placement in DRAM (RAPID): software methods for quasi-non-volatile DRAM , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[19] Richard E. Matick,et al. A 500 MHz Random Cycle, 1.5 ns Latency, SOI Embedded DRAM Macro Featuring a Three-Transistor Micro Sense Amplifier , 2008, IEEE Journal of Solid-State Circuits.

[20] John E. Barth,et al. Embedded DRAM: Technology platform for the Blue Gene/L chip , 2005, IBM J. Res. Dev..

[21] Wei Wu,et al. Reducing cache power with low-cost, multi-bit error-correcting codes , 2010, ISCA.

[22] Jose Renau,et al. Effective Optimistic-Checker Tandem Core Design through Architectural Pruning , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[23] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[24] Babak Falsafi,et al. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[25] Hsien-Hsin S. Lee,et al. Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[26] David Blaauw,et al. Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits , 2010, Proceedings of the IEEE.