Using run-time reverse-engineering to optimize DRAM refresh

The overhead of DRAM refresh is increasing with each density generation. To help offset some of this overhead, JEDEC designed the modern Auto-Refresh command with a highly optimized architecture internal to the DRAM---an architecture that violates the timing rules external controllers must observe and obey during normal operation. Numerous refresh-reduction schemes manually refresh the DRAM row-by-row, eliminating unnecessary refreshes to improve both energy and performance of the DRAM. However, it has been shown that modern Auto-Refresh is incompatible with these schemes, that their manual refreshing of specified rows through explicit Activate and Precharge precludes them from exploiting the architectural optimizations available internally for Auto-Refresh operations. This paper shows that various DRAM timing parameters, which should be followed during normal DRAM operations can be reduced for performing Refresh operation, and by reverse engineering those internal timing parameters at system-init time an external memory controller can use them in conjunction with individual Activate and Precharge commands, thereby reducing the performance overhead afforded Auto-Refresh, while simultaneously supporting row-by-row refresh reduction schemes. Through physical experiments and measurement, we find that our optimized scheme reduces tRFC by up to 45% compared to the already highly-optimized Auto-Refresh mechanism. It is also 10% more energy-efficient and 50% more performance-efficient than the non-optimized row-by-row refresh. Further evaluations done by simulating future 16 Gb DDR4 devices show how the reduction in tRFC improves the application performance and energy efficiency. The proposed technique enhances all of the existing refresh-optimization schemes that use row-by-row refresh, and it does so without requiring any modification to the DRAM or DRAM protocol.

[1]  Onur Mutlu,et al.  Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization , 2016, SIGMETRICS.

[2]  Norbert Wehn,et al.  DRAMSpec: A High-Level DRAM Timing, Power and Area Exploration Tool , 2015, International Journal of Parallel Programming.

[3]  José F. Martínez,et al.  Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems , 2013, ISCA.

[4]  Hsien-Hsin S. Lee,et al.  Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[5]  Luca Benini,et al.  Energy-Efficient Value Based Selective Refresh for Embedded DRAMS , 2006, J. Low Power Electron..

[6]  Yuan Xie,et al.  ProactiveDRAM: A DRAM-initiated retention management scheme , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[7]  Tao Zhang,et al.  CREAM: A Concurrent-Refresh-Aware DRAM Memory architecture , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[8]  Onur Mutlu,et al.  Improving DRAM performance by parallelizing refreshes with accesses , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[9]  Onur Mutlu,et al.  Adaptive-latency DRAM: Optimizing DRAM timing for the common-case , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[10]  Norbert Wehn,et al.  Towards variation-aware system-level power estimation of DRAMs: An empirical approach , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Wongyu Shin,et al.  DRAM-Latency Optimization Inspired by Relationship between Row-Access Time and Refresh Timing , 2016, IEEE Transactions on Computers.

[12]  Lizy Kurian John,et al.  ESKIMO - energy savings using semantic knowledge of inconsequential memory occupancy for DRAM subsystem , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Wongyu Shin,et al.  NUAT: A non-uniform access time memory controller , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[14]  Jun Yang,et al.  Restore truncation for performance improvement in future DRAM systems , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[15]  Norbert Wehn,et al.  Exploiting expendable process-margins in DRAMs for run-time performance optimization , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16]  Richard Veras,et al.  RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[17]  Norbert Wehn,et al.  DRAMSpec: A High-Level DRAM Timing, Power and Area Exploration Tool , 2016, International Journal of Parallel Programming.

[18]  Aditya Agrawal,et al.  CLARA: Circular Linked-List Auto and Self Refresh Architecture , 2016, MEMSYS.

[19]  Luca Benini,et al.  Energy optimization in 3D MPSoCs with Wide-I/O DRAM using temperature variation aware bank-wise refresh , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[20]  Norbert Wehn,et al.  DRAMSys: A Flexible DRAM Subsystem Design Space Exploration Framework , 2015, IPSJ Trans. Syst. LSI Des. Methodol..

[21]  Norbert Wehn,et al.  TLM modelling of 3D stacked wide I/O DRAM subsystems: a virtual platform for memory controller design space exploration , 2013, RAPIDO '13.

[22]  Lizy Kurian John,et al.  Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[23]  Norbert Wehn,et al.  Efficient reliability management in SoCs - an approximate DRAM perspective , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[24]  J. Lucas,et al.  Sparkk : Quality-Scalable Approximate Storage in DRAM , 2014 .

[25]  Luca Benini,et al.  Energy-Efficient Value-Based Selective Refresh for Embedded DRAMs , 2005, PATMOS.

[26]  Rami G. Melhem,et al.  Refresh Now and Then , 2014, IEEE Transactions on Computers.

[27]  Arnab Raha,et al.  Quality-aware data allocation in approximate DRAM* , 2015, 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[28]  Bruce Jacob,et al.  Flexible auto-refresh: Enabling scalable and energy-efficient DRAM refresh reductions , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[29]  Frank Mueller,et al.  Making DRAM Refresh Predictable , 2010, 2010 22nd Euromicro Conference on Real-Time Systems.

[30]  Eric Rotenberg,et al.  Retention-aware placement in DRAM (RAPID): software methods for quasi-non-volatile DRAM , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[31]  Song Liu,et al.  Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.

[32]  Chia-Lin Yang,et al.  SECRET: Selective error correction for refresh energy reduction in DRAMs , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[33]  Jun Yang,et al.  AWARD: Approximation-aWAre Restore in Further Scaling DRAM , 2016, MEMSYS.

[34]  Jose Renau,et al.  Effective Optimistic-Checker Tandem Core Design through Architectural Pruning , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[35]  Norbert Wehn,et al.  Invited: Approximate computing with partially unreliable dynamic random access memory — Approximate DRAM , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[36]  Depei Qian,et al.  Reducing DRAM refreshing in an error correction manner , 2015, Science China Information Sciences.

[37]  Sung Woo Chung,et al.  Exploiting Refresh Effect of DRAM Read Operations: A Practical Approach to Low-Power Refresh , 2016, IEEE Transactions on Computers.

[38]  Norbert Wehn,et al.  Retention time measurements and modelling of bit error rates of WIDE I/O DRAM in MPSoCs , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[39]  Norbert Wehn,et al.  Omitting Refresh: A Case Study for Commodity and Wide I/O DRAMs , 2015, MEMSYS.

[40]  Bruce Jacob,et al.  DRAM Refresh Mechanisms, Penalties, and Trade-Offs , 2016, IEEE Transactions on Computers.

[41]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[42]  Onur Mutlu,et al.  AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[43]  Onur Mutlu,et al.  An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms , 2013, ISCA.

[44]  Jun Yang,et al.  Exploiting DRAM restore time variations in deep sub-micron scaling , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[45]  Moinuddin K. Qureshi,et al.  Refresh pausing in DRAM memory systems , 2014, TACO.

[46]  Sally A. McKee,et al.  DTail: a flexible approach to DRAM refresh management , 2014, ICS '14.

[47]  Norbert Wehn,et al.  A Platform to Analyze DDR3 DRAM’s Power and Retention Time , 2017, IEEE Design & Test.

[48]  Bruce Jacob,et al.  Coordinated refresh: Energy efficient techniques for DRAM refresh scheduling , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[49]  Madhu Mutyam,et al.  EFGR: An Enhanced Fine Granularity Refresh Feature for High-Performance DDR4 DRAM Devices , 2014, ACM Trans. Archit. Code Optim..

[50]  Narayanan Vijaykrishnan,et al.  Refresh Enabled Video Analytics (REVA): Implications on power and performance of DRAM supported embedded visual systems , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).