Experimental Characterization, Optimization, and Recovery of Data Retention Errors in MLC NAND Flash Memory

This paper summarizes our work on experimentally characterizing, mitigating, and recovering data retention errors in multi-level cell (MLC) NAND flash memory, which was published in HPCA 2015, and examines the work's significance and future potential. Retention errors, caused by charge leakage over time, are the dominant source of flash memory errors. Understanding, characterizing, and reducing retention errors can significantly improve NAND flash memory reliability and endurance. In this work, we first characterize, with real 2Y-nm MLC NAND flash chips, how the threshold voltage distribution of flash memory changes with different retention ages -- the length of time since a flash cell was programmed. We observe from our characterization results that 1) the optimal read reference voltage of a flash cell, using which the data can be read with the lowest raw bit error rate (RBER), systematically changes with its retention age, and 2) different regions of flash memory can have different retention ages, and hence different optimal read reference voltages. Based on our findings, we propose two new techniques. First, Retention Optimized Reading (ROR) adaptively learns and applies the optimal read reference voltage for each flash memory block online. The key idea of ROR is to periodically learn a tight upper bound of the optimal read reference voltage, and from there approach the optimal read reference voltage. Our evaluations show that ROR can extend flash memory lifetime by 64% and reduce average error correction latency by 10.1%. Second, Retention Failure Recovery (RFR) recovers data with uncorrectable errors offline by identifying and probabilistically correcting flash cells with retention errors. Our evaluation shows that RFR essentially doubles the error correction capability.

[1]  Zhen Fang,et al.  Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  Lizy Kurian John,et al.  Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[3]  Mor Harchol-Balter,et al.  ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .

[4]  Jin Sun,et al.  Utility-Based Hybrid Memory Management , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[5]  Onur Mutlu,et al.  Tiered-latency DRAM: A low latency and low cost DRAM architecture , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[6]  Jing Li,et al.  Evaluating Row Buffer Locality in Future Non-Volatile Main Memories , 2018, ArXiv.

[7]  Eric Rotenberg,et al.  Retention-aware placement in DRAM (RAPID): software methods for quasi-non-volatile DRAM , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[8]  Osman S. Unsal,et al.  Flash correct-and-refresh: Retention-aware error management for increased flash memory lifetime , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[9]  Onur Mutlu,et al.  PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[10]  Onur Mutlu,et al.  SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[11]  J. Kessenich,et al.  Bit error rate in NAND Flash memories , 2008, 2008 IEEE International Reliability Physics Symposium.

[12]  Rachata Ausavarungnirun,et al.  Row buffer locality aware caching policies for hybrid memories , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[13]  Shimeng Yu,et al.  Metal–Oxide RRAM , 2012, Proceedings of the IEEE.

[14]  Tong Zhang,et al.  Quasi-nonvolatile SSD: Trading flash memory nonvolatility to improve storage system performance for enterprise applications , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[15]  Onur Mutlu,et al.  The RowHammer problem and other issues we may face as memory becomes denser , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[16]  Onur Mutlu,et al.  Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization , 2016, SIGMETRICS.

[17]  L. Chua Memristor-The missing circuit element , 1971 .

[18]  Alessandro Calderoni,et al.  A copper ReRAM cell for Storage Class Memory applications , 2014, 2014 Symposium on VLSI Technology (VLSI-Technology): Digest of Technical Papers.

[19]  Onur Mutlu,et al.  Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Mor Harchol-Balter,et al.  Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[21]  Onur Mutlu,et al.  Errors in Flash-Memory-Based Solid-State Drives: Analysis, Mitigation, and Recovery , 2017, ArXiv.

[22]  Srinivas Devadas,et al.  Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Jongmoo Choi,et al.  Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[24]  Shuhei Tanakamaru,et al.  95%-lower-BER 43%-lower-power intelligent solid-state drive (SSD) with asymmetric coding and stripe pattern elimination algorithm , 2011, 2011 IEEE International Solid-State Circuits Conference.

[25]  Onur Mutlu,et al.  Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[26]  Mohammad Arjomand,et al.  Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory , 2017, SIGMETRICS.

[27]  Ken Mai,et al.  FPGA-Based Solid-State Drive Prototyping Platform , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[28]  S. Phadke,et al.  MLP aware heterogeneous memory system , 2011, 2011 Design, Automation & Test in Europe.

[29]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[30]  Onur Mutlu,et al.  The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern Commodity DRAM Devices , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[31]  José F. Martínez,et al.  Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems , 2013, ISCA.

[32]  Seiichi Aritome,et al.  Data-Retention Characteristics Comparison of 2D and 3D TLC NAND Flash Memories , 2017, 2017 IEEE International Memory Workshop (IMW).

[33]  Onur Mutlu,et al.  Improving DRAM performance by parallelizing refreshes with accesses , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[34]  Onur Mutlu,et al.  Efficient Data Mapping and Buffering Techniques for Multilevel Cell Phase-Change Memories , 2014, ACM Trans. Archit. Code Optim..

[35]  Jun Yang,et al.  Mitigating Write Disturbance in Super-Dense Phase Change Memories , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[36]  Alessandro Calderoni,et al.  Challenges for high-density 16Gb ReRAM with 27nm technology , 2015 .

[37]  Wook-Ghee Hahn,et al.  7.2 A 128Gb 3b/cell V-NAND flash memory with 1Gb/s I/O rate , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[38]  Onur Mutlu,et al.  Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[39]  Onur Mutlu,et al.  ChargeCache: Reducing DRAM latency by exploiting row access locality , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[40]  Onur Mutlu,et al.  Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[41]  Ricardo Bianchini,et al.  Page placement in hybrid memory systems , 2011, ICS '11.

[42]  Onur Mutlu,et al.  Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives , 2017, Proceedings of the IEEE.

[43]  Onur Mutlu,et al.  HeatWatch: Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[44]  Haralampos Pozidis,et al.  Multilevel-Cell Phase-Change Memory: A Viable Technology , 2016, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[45]  Onur Mutlu,et al.  Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management , 2012, IEEE Computer Architecture Letters.

[46]  Jeong-Don Ihm,et al.  7.1 256Gb 3b/cell V-NAND flash memory with 48 stacked WL layers , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[47]  Qiao Li,et al.  Reducing LDPC soft sensing latency by lightweight data refresh for flash read performance improvement , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[48]  Tao Li,et al.  Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast and Durable Memory Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[49]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[50]  Onur Mutlu,et al.  The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study , 2014, SIGMETRICS '14.

[51]  O. Mutlu,et al.  Enabling Accurate and Practical Online Flash Channel Modeling for Modern MLC NAND Flash Memory , 2016, IEEE Journal on Selected Areas in Communications.

[52]  Jihong Kim,et al.  Flashdefibrillator: a data recovery technique for retention failures in NAND flash memory , 2015, 2015 IEEE Non-Volatile Memory System and Applications Symposium (NVMSA).

[53]  Kinam Kim,et al.  A New Investigation of Data Retention Time in Truly Nanoscaled DRAMs , 2009, IEEE Electron Device Letters.

[54]  Nikolaos Papandreou,et al.  Using adaptive read voltage thresholds to enhance the reliability of MLC NAND flash memory systems , 2014, GLSVLSI '14.

[55]  Thomas P. Parnell,et al.  Modelling of the threshold voltage distributions of sub-20nm NAND flash memory , 2014, 2014 IEEE Global Communications Conference.

[56]  Onur Mutlu,et al.  Simultaneous Multi-Layer Access , 2016, ACM Trans. Archit. Code Optim..

[57]  Onur Mutlu,et al.  A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[58]  Lizy Kurian John,et al.  ESKIMO - energy savings using semantic knowledge of inconsequential memory occupancy for DRAM subsystem , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[59]  Onur Mutlu,et al.  AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[60]  Yoon-Hee Choi,et al.  Three-Dimensional 128 Gb MLC Vertical nand Flash Memory With 24-WL Stacked Layers and 50 MB/s High-Speed Programming , 2014, IEEE Journal of Solid-State Circuits.

[61]  Chung Lam,et al.  7.3 A resistance-drift compensation scheme to reduce MLC PCM raw BER by over 100× for storage-class memory applications , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[62]  You Zhou,et al.  Characterizing 3D Floating Gate NAND Flash , 2017, SIGMETRICS.

[63]  Jongmoo Choi,et al.  WARM: Improving NAND flash memory lifetime with write-hotness aware retention management , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[64]  Wei Wu,et al.  Optimizing NAND flash-based SSDs via retention relaxation , 2012, FAST.

[65]  A. Pirovano,et al.  Low-field amorphous state resistance and threshold voltage drift in chalcogenide materials , 2004, IEEE Transactions on Electron Devices.

[66]  Richard Veras,et al.  RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[67]  Kevin K. Chang,et al.  Understanding and Improving the Latency of DRAM-Based Memory Systems , 2017, ArXiv.

[68]  Arif Merchant,et al.  Reliability of nand-Based SSDs: What Field Studies Tell Us , 2017, Proceedings of the IEEE.

[69]  Onur Mutlu,et al.  Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[70]  Onur Mutlu,et al.  Memory scaling: A systems architecture perspective , 2013, 2013 5th IEEE International Memory Workshop.

[71]  D. Ielmini,et al.  Recovery and Drift Dynamics of Resistance and Threshold Voltages in Phase-Change Memories , 2007, IEEE Transactions on Electron Devices.

[72]  Osman S. Unsal,et al.  Neighbor-cell assisted error correction for MLC NAND flash memories , 2014, SIGMETRICS '14.

[73]  Edwin Hsing-Mean Sha,et al.  Retention Trimming for Lifetime Improvement of Flash Memory Storage Systems , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[74]  T. Hamamoto,et al.  On the retention time distribution of dynamic random access memory (DRAM) , 1998 .

[75]  Onur Mutlu,et al.  ERRoR ANAlysIs AND RETENTIoN-AwARE ERRoR MANAgEMENT FoR NAND FlAsh MEMoRy , 2013 .

[76]  Onur Mutlu,et al.  Phase change memory architecture and the quest for scalability , 2010, Commun. ACM.

[77]  Onur Mutlu,et al.  Data retention in MLC NAND flash memory: Characterization, optimization, and recovery , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[78]  Yan Solihin,et al.  CHOP: Adaptive filter-based DRAM caching for CMP server platforms , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[79]  Gabriel H. Loh,et al.  Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[80]  Rino Micheloni,et al.  3D Flash Memories , 2016, Springer Netherlands.

[81]  Onur Mutlu,et al.  The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[82]  Rami G. Melhem,et al.  Refresh Now and Then , 2014, IEEE Transactions on Computers.

[83]  Luca Crippa,et al.  Array Architectures for 3-D NAND Flash Memories , 2017, Proceedings of the IEEE.

[84]  Onur Mutlu,et al.  A Case for Effic ient Hardware/Soft ware Cooperative Management of Storage and Memory , 2013 .

[85]  Chris Fallin,et al.  Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[86]  Hongzhong Zheng,et al.  Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling , 2014 .

[87]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[88]  Robert H. Dennard,et al.  Challenges and future directions for the scaling of dynamic random-access memory (DRAM) , 2002, IBM J. Res. Dev..

[89]  Onur Mutlu,et al.  Detecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current Memory Content , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[90]  Onur Mutlu,et al.  An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms , 2013, ISCA.

[91]  Yixin Luo,et al.  Improving the reliability of chip-off forensic analysis of NAND flash memory devices , 2017, Digit. Investig..

[92]  Sung-Jin Choi,et al.  Comprehensive evaluation of early retention (fast charge loss within a few seconds) characteristics in tube-type 3-D NAND flash memory , 2016, 2016 IEEE Symposium on VLSI Technology.

[93]  Onur Mutlu,et al.  Threshold voltage distribution in MLC NAND flash memory: Characterization, analysis, and modeling , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[94]  Shu Lin,et al.  Error Control Coding , 2004 .

[95]  H.-S. Philip Wong,et al.  Phase Change Memory , 2010, Proceedings of the IEEE.

[96]  Qiang Wu,et al.  A Large-Scale Study of Flash Memory Failures in the Field , 2015, SIGMETRICS 2015.

[97]  Onur Mutlu,et al.  Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.

[98]  Onur Mutlu,et al.  Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..

[99]  Zhe Zhang,et al.  Memory module-level testing and error behaviors for phase change memory , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[100]  Jie Liu,et al.  SSD Failures in Datacenters: What? When? and Why? , 2016, SYSTOR.

[101]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[102]  Jun Yang,et al.  Phase-Change Technology and the Future of Main Memory , 2010, IEEE Micro.

[103]  Aamer Jaleel,et al.  CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[104]  Onur Mutlu,et al.  A Case for Memory Content-Based Detection and Mitigation of Data-Dependent Failures in DRAM , 2017, IEEE Computer Architecture Letters.

[105]  Onur Mutlu,et al.  Adaptive-latency DRAM: Optimizing DRAM timing for the common-case , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[106]  Onur Mutlu,et al.  Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation , 2013, ICCD.

[107]  Rachata Ausavarungnirun,et al.  RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[108]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[109]  Aamer Jaleel,et al.  BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).