Understanding and Modeling On-Die Error Correction in Modern DRAM: An Experimental Study Using Real Devices

Experimental characterization of DRAM errors is a powerful technique for understanding DRAM behavior and provides valuable insights for improving overall system performance, energy efficiency, and reliability. Unfortunately, recent DRAM technology scaling issues are forcing manufacturers to adopt on-die error-correction codes (ECC), which pose a significant challenge for DRAM error characterization studies by obfuscating raw error distributions using undocumented, proprietary, and opaque error-correction hardware. As we show in this work, errors observed in devices with on-die ECC no longer follow expected, well-studied distributions (e.g., lognormal retention times) but rather depend on the particular ECC scheme used.

[1]  Onur Mutlu,et al.  RowHammer: A Retrospective , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Rüdiger L. Urbanke,et al.  Modern Coding Theory , 2008 .

[3]  Irina Adjudeanu,et al.  Codes correcteurs d'erreurs LDPC structurés , 2010 .

[4]  Onur Mutlu,et al.  Solar-DRAM: Reducing DRAM Access Latency by Exploiting the Variation in Local Bitlines , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[5]  Onur Mutlu,et al.  The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[6]  Satoru Yamada,et al.  A novel method to characterize DRAM process variation by the analyzing stochastic properties of retention time distribution , 2017, 2017 IEEE Electron Devices Technology and Manufacturing Conference (EDTM).

[7]  Kyung-Tae Kim,et al.  23.3 A 4.8Gb/s/pin 2Gb LPDDR4 SDRAM with sub-100µA self-refresh current for IoT applications , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[8]  Fatemeh Tehranipoor,et al.  Robust hardware true random number generators using DRAM remanence effects , 2016, 2016 IEEE International Symposium on Hardware Oriented Security and Trust (HOST).

[9]  Eric Rotenberg,et al.  Retention-aware placement in DRAM (RAPID): software methods for quasi-non-volatile DRAM , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[10]  Norbert Wehn,et al.  Improving the error behavior of DRAM by exploiting its Z-channel property , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  H.-S. Philip Wong,et al.  Phase Change Memory , 2010, Proceedings of the IEEE.

[12]  Onur Mutlu,et al.  SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[13]  Marco Ottavi,et al.  Characterization of data retention faults in DRAM devices , 2014, 2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT).

[14]  Jie Liu,et al.  Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[15]  Kyungbae Park,et al.  Experiments and root cause analysis for active-precharge hammering fault in DDR3 SDRAM under 3 × nm technology , 2016, Microelectron. Reliab..

[16]  Norbert Wehn,et al.  Thermal Aspects and High-Level Explorations of 3D Stacked DRAMs , 2015, 2015 IEEE Computer Society Annual Symposium on VLSI.

[17]  Norbert Wehn,et al.  Retention time measurements and modelling of bit error rates of WIDE I/O DRAM in MPSoCs , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Thomas Vogelsang,et al.  DRAM Retention at Cryogenic Temperatures , 2018, 2018 IEEE International Memory Workshop (IMW).

[19]  H. Ohno,et al.  A multi-level-cell spin-transfer torque memory with series-stacked magnetotunnel junctions , 2010, 2010 Symposium on VLSI Technology.

[20]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[21]  Onur Mutlu,et al.  The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern Commodity DRAM Devices , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[22]  Satyajit Desai,et al.  Process variation aware DRAM (Dynamic Random Access Memory) design using block-based adaptive body biasing algorithm , 2012 .

[23]  Shimeng Yu,et al.  Metal–Oxide RRAM , 2012, Proceedings of the IEEE.

[24]  Onur Mutlu,et al.  Memory scaling: A systems architecture perspective , 2013, 2013 5th IEEE International Memory Workshop.

[25]  Biswajit Ray,et al.  Exploiting DRAM Latency Variations for Generating True Random Numbers , 2018, 2019 IEEE International Conference on Consumer Electronics (ICCE).

[26]  Onur Mutlu,et al.  An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms , 2013, ISCA.

[27]  Yixin Luo,et al.  Improving the reliability of chip-off forensic analysis of NAND flash memory devices , 2017, Digit. Investig..

[28]  Jun Yang,et al.  Exploiting DRAM restore time variations in deep sub-micron scaling , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[29]  S. Wicker Error Control Systems for Digital Communication and Storage , 1994 .

[30]  Onur Mutlu,et al.  Improving DRAM performance by parallelizing refreshes with accesses , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[31]  Zhenhua Zhang,et al.  A Trustworthy Key Generation Prototype Based on DDR3 PUF for Wireless Sensor Networks , 2014, 2014 International Symposium on Computer, Consumer and Control.

[32]  Onur Mutlu,et al.  Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[33]  Myoung Jin Lee,et al.  A Mechanism for Dependence of Refresh Time on Data Pattern in DRAM , 2010, IEEE Electron Device Letters.

[34]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[35]  Amir Rahmati,et al.  Probable cause: The deanonymizing effects of approximate DRAM , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[36]  Hsien-Hsin S. Lee,et al.  Tri-level-cell phase change memory: toward an efficient and reliable memory system , 2013, ISCA.

[37]  Hoon Shin,et al.  A 16Gb LPDDR4X SDRAM with an NBTI-tolerant circuit solution, an SWD PMOS GIDL reduction technique, an adaptive gear-down scheme and a metastable-free DQS aligner in a 10nm class DRAM process , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[38]  R. Dean Adams,et al.  High Performance Memory Testing: Design Principles, Fault Modeling and Self-Test , 2002 .

[39]  Rachata Ausavarungnirun,et al.  RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[40]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[41]  Onur Mutlu,et al.  AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[42]  Dwijendra K. Ray-Chaudhuri,et al.  Binary mixture flow with free energy lattice Boltzmann methods , 2022, arXiv.org.

[43]  Onur Mutlu,et al.  Detecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current Memory Content , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[44]  O. Mutlu,et al.  What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study , 2018, PERV.

[45]  Norbert Wehn,et al.  Omitting Refresh: A Case Study for Commodity and Wide I/O DRAMs , 2015, MEMSYS.

[46]  Sukhan Lee,et al.  CiDRA: A cache-inspired DRAM resilience architecture , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[47]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[48]  Takeshi Hamamoto,et al.  Well concentration: a novel scaling limitation factor derived from DRAM retention time and its modeling , 1995, Proceedings of International Electron Devices Meeting.

[49]  Kyungbae Park,et al.  Statistical distributions of row-hammering induced failures in DDR3 components , 2016, Microelectron. Reliab..

[50]  Onur Mutlu,et al.  PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[51]  Onur Mutlu,et al.  D-RaNGe: Using Commodity DRAM Devices to Generate True Random Numbers with Low Latency and High Throughput , 2018, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[52]  Shu Lin,et al.  Error control coding : fundamentals and applications , 1983 .

[53]  Onur Mutlu,et al.  The RowHammer problem and other issues we may face as memory becomes denser , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[54]  Bianca Schroeder,et al.  Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design , 2012, ASPLOS XVII.

[55]  Vivek Seshadri,et al.  Simple DRAM and Virtual Memory Abstractions to Enable Highly Efficient Memory Systems , 2016, ArXiv.

[56]  Onur Mutlu,et al.  Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization , 2016, SIGMETRICS.

[57]  Moinuddin K. Qureshi,et al.  XED: Exposing On-Die Error Detection Information for Strong Memory Reliability , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[58]  Sungjoo Hong,et al.  Memory technology trend and future challenges , 2010, 2010 International Electron Devices Meeting.

[59]  W. Cary Huffman,et al.  Fundamentals of Error-Correcting Codes , 1975 .

[60]  Rami G. Melhem,et al.  Mitigating bitline crosstalk noise in DRAM memories , 2017, MEMSYS.

[61]  Dae-Hyun Kim,et al.  ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates , 2013, ISCA.

[62]  Richard Veras,et al.  RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[63]  U. Lieneweg,et al.  Assesment of DRAM Reliability from Retention Time Measurements , 1998 .

[64]  S.W. Nam,et al.  High performance PRAM cell scalable to sub-20nm technology with below 4F2 cell size, extendable to DRAM applications , 2010, 2010 Symposium on VLSI Technology.

[65]  Yoongu Kim,et al.  Architectural Techniques to Enhance DRAM Scaling , 2018 .

[66]  Jan Peter van Zandwijk,et al.  Bit-errors as a source of forensic information in NAND-flash memory , 2017, Digit. Investig..

[67]  J. Ehrmann,et al.  Challenges and Future Directions of Laser Fuse Processing in Memory Repair , 2003 .

[68]  Yiran Chen,et al.  Multi-level cell STT-RAM: Is it realistic or just a dream? , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[69]  Onur Mutlu,et al.  The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study , 2014, SIGMETRICS '14.

[70]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[71]  W. Robert Daasch,et al.  Copula Models of Correlation: A DRAM Case Study , 2014, IEEE Transactions on Computers.

[72]  Yuan Xie,et al.  ProactiveDRAM: A DRAM-initiated retention management scheme , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[73]  Onur Mutlu,et al.  ChargeCache: Reducing DRAM latency by exploiting row access locality , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[74]  Onur Mutlu,et al.  Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[75]  Yu Wang,et al.  Hi-fi playback: Tolerating position errors in shift operations of racetrack memory , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[76]  Onur Mutlu,et al.  A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[77]  Masashi Horiguchi,et al.  Nanoscale Memory Repair , 2011, Integrated Circuits and Systems.

[78]  R. Roth Introduction to Coding Theory: Introduction to Finite Fields , 2006 .

[79]  John Shalf,et al.  Memory Errors in Modern Systems: The Good, The Bad, and The Ugly , 2015, ASPLOS.

[80]  Onur Mutlu,et al.  Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.

[81]  Y. Taur,et al.  A 4 Mb Low-temperature DRAM , 1991, 1991 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[82]  Onur Mutlu,et al.  Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..

[83]  Stefan Katzenbeisser,et al.  Intrinsic Rowhammer PUFs: Leveraging the Rowhammer effect for improved security , 2017, 2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST).

[84]  Woong Choi,et al.  A DRAM based physical unclonable function capable of generating >1032 Challenge Response Pairs per 1Kbit array for secure chip authentication , 2017, 2017 IEEE Custom Integrated Circuits Conference (CICC).

[85]  A. Weber,et al.  Data retention analysis on individual cells of 256Mb DRAM i n 110nm technology , 2005, Proceedings of 35th European Solid-State Device Research Conference, 2005. ESSDERC 2005..

[86]  Hyun-Soo Park,et al.  23.4 An extremely low-standby-power 3.733Gb/s/pin 2Gb LPDDR4 SDRAM for wearable devices , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[87]  A. Hiraiwa,et al.  Statistical modeling of dynamic random access memory data retention characteristics , 1996 .

[88]  Norbert Wehn,et al.  Exploiting expendable process-margins in DRAMs for run-time performance optimization , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[89]  Jongmoo Choi,et al.  Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[90]  Howard Leo Kalter,et al.  A 50-ns 16-Mb DRAM with a 10-ns data rate and on-chip ECC , 1990 .

[91]  Qiang Wu,et al.  Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[92]  Chris Fallin,et al.  Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[93]  Shiro Kamohara,et al.  A new method for predicting distribution of DRAM retention time , 2001, 2001 IEEE International Reliability Physics Symposium Proceedings. 39th Annual (Cat. No.00CH37167).

[94]  Hongzhong Zheng,et al.  Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling , 2014 .

[95]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[96]  Robert H. Dennard,et al.  Challenges and future directions for the scaling of dynamic random-access memory (DRAM) , 2002, IBM J. Res. Dev..

[97]  Kazuaki Murakami,et al.  Optimizing the DRAM refresh count for merged DRAM/logic LSIs , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[98]  O Seongil,et al.  Defect Analysis and Cost-Effective Resilience Architecture for Future DRAM Devices , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[99]  Eduardo Pinheiro,et al.  DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.

[100]  Onur Mutlu,et al.  Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[101]  Fatemeh Tehranipoor,et al.  Investigation of DRAM PUFs reliability under device accelerated aging effects , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[102]  Rami G. Melhem,et al.  Refresh Now and Then , 2014, IEEE Transactions on Computers.

[103]  Stefan Katzenbeisser,et al.  Run-Time Accessible DRAM PUFs in Commodity Devices , 2016, CHES.

[104]  Doris Schmitt-Landsiedel,et al.  DRAM Yield Analysis and Optimization by a Statistical Design Approach , 2011, IEEE Transactions on Circuits and Systems I: Regular Papers.

[105]  Vilas Sridharan,et al.  A study of DRAM failures in the field , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[106]  A. Ditali,et al.  X-Ray Radiation Effect in DRAM Retention Time , 2007, IEEE Transactions on Device and Materials Reliability.

[107]  Kevin K. Chang,et al.  Understanding and Improving the Latency of DRAM-Based Memory Systems , 2017, ArXiv.

[108]  Wei Kong,et al.  Analysis of Retention Time Distribution of Embedded DRAM - A New Method to Characterize Across-Chip Threshold Voltage Variation , 2008, 2008 IEEE International Test Conference.

[109]  Norbert Wehn,et al.  Reverse Engineering of DRAMs: Row Hammer with Crosshair , 2016, MEMSYS.

[110]  Chia-Lin Yang,et al.  SECRET: Selective error correction for refresh energy reduction in DRAMs , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[111]  Jan Peter van Zandwijk,et al.  A mathematical approach to NAND flash-memory descrambling and decoding , 2015, Digit. Investig..

[112]  Onur Mutlu,et al.  Tiered-latency DRAM: A low latency and low cost DRAM architecture , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[113]  T. Hamamoto,et al.  On the retention time distribution of dynamic random access memory (DRAM) , 1998 .

[114]  Sergei Skorobogatov,et al.  Reverse Engineering Flash EEPROM Memories Using Scanning Electron Microscopy , 2016, CARDIS.

[115]  Herbert Bos,et al.  Exploiting Correcting Codes: On the Effectiveness of ECC Memory Against Rowhammer Attacks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[116]  Onur Mutlu,et al.  Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives , 2017, Proceedings of the IEEE.

[117]  J. Bibb Cain,et al.  Error-Correction Coding for Digital Communications , 1981 .

[118]  Kinam Kim,et al.  A New Investigation of Data Retention Time in Truly Nanoscaled DRAMs , 2009, IEEE Electron Device Letters.

[119]  Luca Benini,et al.  Optimized active and power-down mode refresh control in 3D-DRAMs , 2014, 2014 22nd International Conference on Very Large Scale Integration (VLSI-SoC).

[120]  Chenming Hu,et al.  Impact of gate-induced drain leakage current on the tail distribution of DRAM data retention time , 2000, International Electron Devices Meeting 2000. Technical Digest. IEDM (Cat. No.00CH37138).

[121]  Hubert Kaeslin,et al.  Dynamic memory-based physically unclonable function for the generation of unique identifiers and true random numbers , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[122]  Onur Mutlu,et al.  A Case for Memory Content-Based Detection and Mitigation of Data-Dependent Failures in DRAM , 2017, IEEE Computer Architecture Letters.

[123]  Domenic Forte,et al.  LDPUF: Exploiting DRAM Latency Variations to Generate Robust Device Signatures , 2018, ArXiv.

[124]  Onur Mutlu,et al.  Adaptive-latency DRAM: Optimizing DRAM timing for the common-case , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[125]  Tao Zhang,et al.  Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[126]  Hyoung-Joo Kim,et al.  A 3.2 Gbps/pin 8 Gbit 1.0 V LPDDR4 SDRAM With Integrated ECC Engine for Sub-1 V DRAM Core Operation , 2015, IEEE Journal of Solid-State Circuits.

[127]  Donghyuk Lee,et al.  Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity , 2016, ArXiv.