Data retention in MLC NAND flash memory: Characterization, optimization, and recovery

Retention errors, caused by charge leakage over time, are the dominant source of flash memory errors. Understanding, characterizing, and reducing retention errors can significantly improve NAND flash memory reliability and endurance. In this paper, we first characterize, with real 2y-nm MLC NAND flash chips, how the threshold voltage distribution of flash memory changes with different retention age - the length of time since a flash cell was programmed. We observe from our characterization results that 1) the optimal read reference voltage of a flash cell, using which the data can be read with the lowest raw bit error rate (RBER), systematically changes with its retention age, and 2) different regions of flash memory can have different retention ages, and hence different optimal read reference voltages. Based on our findings, we propose two new techniques. First, Retention Optimized Reading (ROR) adaptively learns and applies the optimal read reference voltage for each flash memory block online. The key idea of ROR is to periodically learn a tight upper bound, and from there approach the optimal read reference voltage. Our evaluations show that ROR can extend flash memory lifetime by 64% and reduce average error correction latency by 10.1%, with only 768 KB storage overhead in flash memory for a 512 GB flash-based SSD. Second, Retention Failure Recovery (RFR) recovers data with uncorrectable errors offline by identifying and probabilistically correcting flash cells with retention errors. Our evaluation shows that RFR reduces RBER by 50%, which essentially doubles the error correction capability, and thus can effectively recover data from otherwise uncorrectable flash errors.

[1]  Onur Mutlu,et al.  Memory scaling: A systems architecture perspective , 2013, 2013 5th IEEE International Memory Workshop.

[2]  Osman S. Unsal,et al.  Neighbor-cell assisted error correction for MLC NAND flash memories , 2014, SIGMETRICS '14.

[3]  R. Degraeve,et al.  Analytical percolation model for predicting anomalous charge loss in flash memories , 2004, IEEE Transactions on Electron Devices.

[4]  D. Strukov,et al.  The area and latency tradeoffs of binary bit-parallel BCH decoders for prospective nanoelectronic memories , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[5]  Eric Rotenberg,et al.  Retention-aware placement in DRAM (RAPID): software methods for quasi-non-volatile DRAM , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[6]  Ken Mai,et al.  FPGA-Based Solid-State Drive Prototyping Platform , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[7]  Onur Mutlu,et al.  ERRoR ANAlysIs AND RETENTIoN-AwARE ERRoR MANAgEMENT FoR NAND FlAsh MEMoRy , 2013 .

[8]  J. E. Brewer,et al.  Nonvolatile Memory Technologies with Emphasis on Flash: A Comprehensive Guide to Understanding and Using Flash Memory Devices , 2008 .

[9]  In-Cheol Park,et al.  6.4Gb/s multi-threaded BCH encoder and decoder for multi-channel SSD controllers , 2012, 2012 IEEE International Solid-State Circuits Conference.

[10]  Joe Brewer,et al.  Nonvolatile memory technologies with emphasis on flash , 2007 .

[11]  Onur Mutlu,et al.  The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study , 2014, SIGMETRICS '14.

[12]  M. Wada,et al.  Stress induced leakage current limiting to scale down EEPROM tunnel oxide thickness , 1988, Technical Digest., International Electron Devices Meeting.

[13]  Y. Mori,et al.  Analysis of detrap current due to oxide traps to improve flash memory retention , 2000, 2000 IEEE International Reliability Physics Symposium Proceedings. 38th Annual (Cat. No.00CH37059).

[14]  Onur Mutlu,et al.  Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[15]  Shuhei Tanakamaru,et al.  95%-lower-BER 43%-lower-power intelligent solid-state drive (SSD) with asymmetric coding and stripe pattern elimination algorithm , 2011, 2011 IEEE International Solid-State Circuits Conference.

[16]  Chin-Long Chen,et al.  High-speed decoding of BCH codes , 1981, IEEE Trans. Inf. Theory.

[17]  J. Kessenich,et al.  Bit error rate in NAND Flash memories , 2008, 2008 IEEE International Reliability Physics Symposium.

[18]  Kinam Kim,et al.  A New Investigation of Data Retention Time in Truly Nanoscaled DRAMs , 2009, IEEE Electron Device Letters.

[19]  Nikolaos Papandreou,et al.  Using adaptive read voltage thresholds to enhance the reliability of MLC NAND flash memory systems , 2014, GLSVLSI '14.

[20]  Antony I. T. Rowstron,et al.  Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[21]  Wei Wu,et al.  Optimizing NAND flash-based SSDs via retention relaxation , 2012, FAST.

[22]  William Ryan,et al.  Channel Codes by William Ryan , 2009 .

[23]  Young-Hyun Jun,et al.  A 21 nm High Performance 64 Gb MLC NAND Flash Memory With 400 MB/s Asynchronous Toggle DDR Interface , 2012, IEEE Journal of Solid-State Circuits.

[24]  R. Fowler,et al.  Electron Emission in Intense Electric Fields , 1928 .

[25]  Raju Rangaswami,et al.  I/O Deduplication: Utilizing content similarity to improve I/O performance , 2010, TOS.

[26]  David A. Baglee Characteristics & Reliability of 100Å Oxides , 1984, 22nd International Reliability Physics Symposium.

[27]  Kinam Kim,et al.  Degradation of tunnel oxide by FN current stress and its effects on data retention characteristics of 90 nm NAND flash memory cells , 2003, 2003 IEEE International Reliability Physics Symposium Proceedings, 2003. 41st Annual..

[28]  Sungho Kang,et al.  Data Randomization Scheme for Endurance Enhancement and Interference Mitigation of Multilevel Flash Memory Devices , 2013 .

[29]  P. Kalavade,et al.  Flash EEPROM threshold instabilities due to charge trapping during program/erase cycling , 2004, IEEE Transactions on Device and Materials Reliability.

[30]  Onur Mutlu,et al.  Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation , 2013, ICCD.

[31]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[32]  Myounggon Kang,et al.  Analysis of Failure Mechanisms and Extraction of Activation Energies $(E_{a})$ in 21-nm nand Flash Cells , 2013, IEEE Electron Device Letters.

[33]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[34]  Shu Lin,et al.  Channel Codes: Classical and Modern , 2009 .

[35]  Osman S. Unsal,et al.  Flash correct-and-refresh: Retention-aware error management for increased flash memory lifetime , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[36]  Luca Crippa,et al.  A 4Gb 2b/cell NAND Flash Memory with Embedded 5b BCH ECC for 36MB/s System Read Throughput , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[37]  Richard Veras,et al.  RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[38]  Tong Zhang,et al.  Quasi-nonvolatile SSD: Trading flash memory nonvolatility to improve storage system performance for enterprise applications , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[39]  Onur Mutlu,et al.  An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms , 2013, ISCA.

[40]  Onur Mutlu,et al.  Threshold voltage distribution in MLC NAND flash memory: Characterization, analysis, and modeling , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[41]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[42]  Shu Lin,et al.  Error Control Coding , 2004 .

[43]  William Ryan,et al.  Channel Codes: Classical and Modern , 2009 .