Mosaic: Exploiting the spatial locality of process variation to reduce refresh energy in on-chip eDRAM modules

EDRAM cells require periodic refresh, which ends up consuming substantial energy for large last-level caches. In practice, it is well known that different eDRAM cells can exhibit very different charge-retention properties. Unfortunately, current systems pessimistically assume worst-case retention times, and end up refreshing all the cells at a conservatively-high rate. In this paper, we propose an alternative approach. We use known facts about the factors that determine the retention properties of cells to build a new model of eDRAM retention times. The model is called Mosaic. The model shows that the retention times of cells in large eDRAM modules exhibit spatial correlation. Therefore, we logically divide the eDRAM module into regions or tiles, profile the retention properties of each tile, and program their refresh requirements in small counters in the cache controller. With this architecture, also called Mosaic, we refresh each tile at a different rate. The result is a 20x reduction in the number of refreshes in large eDRAM modules - practically eliminating refresh as a source of energy consumption.

[1]  Wei Kong,et al.  Analysis of Retention Time Distribution of Embedded DRAM - A New Method to Characterize Across-Chip Threshold Voltage Variation , 2008, 2008 IEEE International Test Conference.

[2]  Amin Ansari,et al.  Refrint: Intelligent refresh to minimize power in on-chip multiprocessor cache hierarchies , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[3]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[4]  Gu-Yeon Wei,et al.  Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[5]  Eric Rotenberg,et al.  Adaptive mode control: A static-power-efficient cache design , 2003, TECS.

[6]  Eric Rotenberg,et al.  Retention-aware placement in DRAM (RAPID): software methods for quasi-non-volatile DRAM , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[7]  Balaram Sinharoy,et al.  POWER7™, a Highly Parallel, Scalable Multi-Core High End Server Processor , 2011, IEEE Journal of Solid-State Circuits.

[8]  Costas J. Spanos,et al.  Modeling within-die spatial correlation effects for process-design co-optimization , 2005, Sixth international symposium on quality electronic design (isqed'05).

[9]  Duane S. Boning,et al.  Analysis and decomposition of spatial variation in integrated circuit processes and devices , 1997 .

[10]  J. Torrellas,et al.  VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects , 2008, IEEE Transactions on Semiconductor Manufacturing.

[11]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[12]  Onur Mutlu,et al.  An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms , 2013, ISCA.

[13]  Wei Wu,et al.  Reducing cache power with low-cost, multi-bit error-correcting codes , 2010, ISCA.

[14]  Eric Rotenberg,et al.  Adaptive mode control: a static-power-efficient cache design , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[15]  Richard Veras,et al.  RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[16]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[17]  Kinam Kim,et al.  A New Investigation of Data Retention Time in Truly Nanoscaled DRAMs , 2009, IEEE Electron Device Letters.

[18]  Bruce Jacob,et al.  Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[19]  Philip G. Emma,et al.  Rethinking Refresh: Increasing Availability and Reducing Power in DRAM for Cache Applications , 2008, IEEE Micro.

[20]  John E. Barth,et al.  Embedded DRAM: Technology platform for the Blue Gene/L chip , 2005, IBM J. Res. Dev..

[21]  Richard E. Matick,et al.  A 500MHz Random Cycle 1.5ns-Latency, SOI Embedded DRAM Macro Featuring a 3T Micro Sense Amplifier , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[22]  Chenming Hu,et al.  Characterization of spatial intrafield gate CD variability, its impact on circuit performance, and spatial mask-level correction , 2004, IEEE Transactions on Semiconductor Manufacturing.

[23]  Kamran Eshraghian,et al.  Principles of CMOS VLSI Design: A Systems Perspective , 1985 .

[24]  Jose Renau,et al.  Effective Optimistic-Checker Tandem Core Design through Architectural Pruning , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[25]  Hsien-Hsin S. Lee,et al.  Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[26]  V. De,et al.  Statistical design for variation tolerance: key to continued Moore's law , 2004, 2004 International Conference on Integrated Circuit Design and Technology (IEEE Cat. No.04EX866).

[27]  Chris H. Kim,et al.  A 700MHz 2T1C embedded DRAM macro in a generic logic process with no boosted supplies , 2011, 2011 IEEE International Solid-State Circuits Conference.

[28]  David R. Kaeli,et al.  Exploiting temporal locality in drowsy cache policies , 2005, CF '05.

[29]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[30]  T. Hamamoto,et al.  On the retention time distribution of dynamic random access memory (DRAM) , 1998 .

[31]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[32]  M. Horiguchi,et al.  Redundancy techniques for high-density DRAMs , 1997, 1997 Proceedings Second Annual IEEE International Conference on Innovative Systems in Silicon.