Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation

Compared to planar (i.e., two-dimensional) NAND flash memory, 3D NAND flash memory uses a new flash cell design, and vertically stacks dozens of silicon layers in a single chip. This allows 3D NAND flash memory to increase storage density using a much less aggressive manufacturing process technology than planar NAND flash memory. The circuit-level and structural changes in 3D NAND flash memory significantly alter how different error sources affect the reliability of the memory. In this paper, through experimental characterization of real, state-of-the-art 3D NAND flash memory chips, we find that 3D NAND flash memory exhibits three new error sources that were not previously observed in planar NAND flash memory: (1) layer-to-layer process variation, a new phenomenon specific to the 3D nature of the device, where the average error rate of each 3D-stacked layer in a chip is significantly different; (2) early retention loss, a new phenomenon where the number of errors due to charge leakage increases quickly within several hours after programming; and (3) retention interference, a new phenomenon where the rate at which charge leaks from a flash cell is dependent on the data value stored in the neighboring cell. Based on our experimental results, we develop new analytical models of layer-to-layer process variation and retention loss in 3D NAND flash memory. Motivated by our new findings and models, we develop four new techniques to mitigate process variation and early retention loss in 3D NAND flash memory. Our first technique, Layer Variation Aware Reading (LaVAR), reduces the effect of layer-to-layer process variation by fine-tuning the read reference voltage separately for each layer. Our second technique, Layer-Interleaved Redundant Array of Independent Disks (LI-RAID), uses information about layer-to-layer process variation to intelligently group pages under the RAID error recovery technique in a manner that reduces the likelihood that the recovery of a group fails significantly earlier than the recovery of other groups. Our third technique, Retention Model Aware Reading (ReMAR), reduces retention errors in 3D NAND flash memory by tracking the retention time of the data using our new retention model and adapting the read reference voltage to data age. Our fourth technique, Retention Interference Aware Neighbor-Cell Assisted Correction (ReNAC), adapts the read reference voltage to the amount of retention interference a page has experienced, in order to re-read the data after a read operation fails. These four techniques are complementary, and can be combined together to significantly improve flash memory reliability. Compared to a state-of-the-art baseline, our techniques, when combined, improve flash memory lifetime by 1.85×. Alternatively, if a NAND flash vendor wants to keep the lifetime of the 3D NAND flash memory device constant, our techniques reduce the storage overhead required to hold error correction information by 78.9%.

[1]  T. Hamamoto,et al.  On the retention time distribution of dynamic random access memory (DRAM) , 1998 .

[2]  Onur Mutlu,et al.  ERRoR ANAlysIs AND RETENTIoN-AwARE ERRoR MANAgEMENT FoR NAND FlAsh MEMoRy , 2013 .

[3]  Roberto Bez,et al.  Introduction to flash memory , 2003, Proc. IEEE.

[4]  Onur Mutlu,et al.  Reliability Issues in Flash-Memory-Based Solid-State Drives: Experimental Analysis, Mitigation, Recovery , 2018 .

[5]  Onur Mutlu,et al.  Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation , 2013, ICCD.

[6]  Meng-Fan Chang,et al.  Layer-Aware Program-and-Read Schemes for 3D Stackable Vertical-Gate BE-SONOS NAND Flash Against Cross-Layer Process Variations , 2015, IEEE Journal of Solid-State Circuits.

[7]  Jongmoo Choi,et al.  WARM: Improving NAND flash memory lifetime with write-hotness aware retention management , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[8]  Yeong-Taek Lee,et al.  A Zeroing Cell-to-Cell Interference Page Architecture With Temporary LSB Storing and Parallel MSB Program Scheme for MLC NAND Flash Memories , 2008, IEEE Journal of Solid-State Circuits.

[9]  Seiichi Aritome,et al.  Data-Retention Characteristics Comparison of 2D and 3D TLC NAND Flash Memories , 2017, 2017 IEEE International Memory Workshop (IMW).

[10]  M. Wada,et al.  Stress induced leakage current limiting to scale down EEPROM tunnel oxide thickness , 1988, Technical Digest., International Electron Devices Meeting.

[11]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[12]  Richard Veras,et al.  RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[13]  Onur Mutlu,et al.  Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[14]  Onur Mutlu,et al.  A Case for Memory Content-Based Detection and Mitigation of Data-Dependent Failures in DRAM , 2017, IEEE Computer Architecture Letters.

[15]  Peng Zhou,et al.  Tunable charge-trap memory based on few-layer MoS2. , 2014, ACS Nano.

[16]  Jihong Kim,et al.  An Integrated Approach for Managing Read Disturbs in High-Density NAND Flash Memory , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Onur Mutlu,et al.  Adaptive-latency DRAM: Optimizing DRAM timing for the common-case , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[18]  Norbert Wehn,et al.  A new bank sensitive DRAMPower model for efficient design space exploration , 2016, 2016 26th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[19]  Onur Mutlu,et al.  Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives , 2017, Proceedings of the IEEE.

[20]  Nikolaos Papandreou,et al.  Using adaptive read voltage thresholds to enhance the reliability of MLC NAND flash memory systems , 2014, GLSVLSI '14.

[21]  Tao Xie,et al.  Understanding the impact of threshold voltage on MLC flash memory performance and reliability , 2014, ICS '14.

[22]  Piero Olivo,et al.  Reliability of 3D NAND Flash Memories , 2016, 3D Flash Memories.

[23]  Thomas P. Parnell,et al.  Modelling of the threshold voltage distributions of sub-20nm NAND flash memory , 2014, 2014 IEEE Global Communications Conference.

[24]  Onur Mutlu,et al.  Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization , 2016, SIGMETRICS.

[25]  Ki-Hong Lee,et al.  Origin of transient Vth shift after erase and its impact on 2D/3D structure charge trap flash memory cell operations , 2012, 2012 International Electron Devices Meeting.

[26]  Ming Zhao,et al.  How Much Can Data Compressibility Help to Improve NAND Flash Memory Lifetime? , 2015, FAST.

[27]  Onur Mutlu,et al.  SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[28]  Chin-Long Chen,et al.  High-speed decoding of BCH codes , 1981, IEEE Trans. Inf. Theory.

[29]  J. Kessenich,et al.  Bit error rate in NAND Flash memories , 2008, 2008 IEEE International Reliability Physics Symposium.

[30]  Sungjin Lee,et al.  Improving performance and lifetime of NAND storage systems using relaxed program sequence , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[31]  Jeong-Don Ihm,et al.  7.1 256Gb 3b/cell V-NAND flash memory with 48 stacked WL layers , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[32]  Onur Mutlu,et al.  An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms , 2013, ISCA.

[33]  Yixin Luo,et al.  Improving the reliability of chip-off forensic analysis of NAND flash memory devices , 2017, Digit. Investig..

[34]  Sung-Jin Choi,et al.  Comprehensive evaluation of early retention (fast charge loss within a few seconds) characteristics in tube-type 3-D NAND flash memory , 2016, 2016 IEEE Symposium on VLSI Technology.

[35]  Onur Mutlu,et al.  Threshold voltage distribution in MLC NAND flash memory: Characterization, analysis, and modeling , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[36]  Onur Mutlu,et al.  AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[37]  Kuo-Pin Chang,et al.  Study of fast initial charge loss and it's impact on the programmed states Vt distribution of charge-trapping NAND Flash , 2010, 2010 International Electron Devices Meeting.

[38]  You Zhou,et al.  Characterizing 3D Floating Gate NAND Flash , 2017, SIGMETRICS.

[39]  Ken Mai,et al.  FPGA-Based Solid-State Drive Prototyping Platform , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[40]  Arif Merchant,et al.  Flash Reliability in Production: The Expected and the Unexpected , 2016, FAST.

[41]  Onur Mutlu,et al.  The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern Commodity DRAM Devices , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[42]  Onur Mutlu,et al.  HeatWatch: Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[43]  Jonghoon Park,et al.  7.5 A 128Gb 2b/cell NAND flash memory in 14nm technology with tPROG=640µs and 800MB/s I/O rate , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[44]  Rui Mao,et al.  P-Alloc , 2017, ACM Trans. Embed. Comput. Syst..

[45]  Qiang Wu,et al.  A Large-Scale Study of Flash Memory Failures in the Field , 2015, SIGMETRICS 2015.

[46]  G. Edward Suh,et al.  Extracting Device Fingerprints from Flash Memory by Exploiting Physical Variations , 2011, TRUST.

[47]  Onur Mutlu,et al.  What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study , 2018, SIGMETRICS.

[48]  Onur Mutlu,et al.  PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[49]  Norbert Wehn,et al.  An analysis on retention error behavior and power consumption of recent DDR4 DRAMs , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[50]  Hao Wang,et al.  Reducing Solid-State Storage Device Write Stress through Opportunistic In-place Delta Compression , 2016, FAST.

[51]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[52]  Sungjin Lee,et al.  Lifetime improvement of NAND flash-based storage systems using dynamic program and erase scaling , 2014, FAST.

[53]  Onur Mutlu,et al.  The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study , 2014, SIGMETRICS '14.

[54]  Onur Mutlu,et al.  Errors in Flash-Memory-Based Solid-State Drives: Analysis, Mitigation, and Recovery , 2017, ArXiv.

[55]  In-Cheol Park,et al.  6.4Gb/s multi-threaded BCH encoder and decoder for multi-channel SSD controllers , 2012, 2012 IEEE International Solid-State Circuits Conference.

[56]  Osman S. Unsal,et al.  Neighbor-cell assisted error correction for MLC NAND flash memories , 2014, SIGMETRICS '14.

[57]  Norbert Wehn,et al.  A Platform to Analyze DDR3 DRAM’s Power and Retention Time , 2017, IEEE Design & Test.

[58]  Wook-Ghee Hahn,et al.  7.2 A 128Gb 3b/cell V-NAND flash memory with 1Gb/s I/O rate , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[59]  O. Mutlu,et al.  Enabling Accurate and Practical Online Flash Channel Modeling for Modern MLC NAND Flash Memory , 2016, IEEE Journal on Selected Areas in Communications.

[60]  Rachata Ausavarungnirun,et al.  Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms , 2017, SIGMETRICS.

[61]  Jie Liu,et al.  SSD Failures in Datacenters: What? When? and Why? , 2016, SYSTOR.

[62]  LuZhonghai,et al.  Characterizing 3D Floating Gate NAND Flash , 2017 .

[63]  Onur Mutlu,et al.  Detecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current Memory Content , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[64]  Norbert Wehn,et al.  Exploiting expendable process-margins in DRAMs for run-time performance optimization , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[65]  A. Visconti,et al.  Random Telegraph Noise Effect on the Programmed Threshold-Voltage Distribution of Flash Memories , 2009, IEEE Electron Device Letters.

[66]  Hyungcheol Shin,et al.  Threshold Voltage Fluctuation by Random Telegraph Noise in Floating Gate nand Flash Memory String , 2011, IEEE Transactions on Electron Devices.

[67]  Onur Mutlu,et al.  Data retention in MLC NAND flash memory: Characterization, optimization, and recovery , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[68]  R. Fowler,et al.  Electron Emission in Intense Electric Fields , 1928 .

[69]  Chris Fallin,et al.  Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[70]  A. Visconti,et al.  Comprehensive Analysis of Random Telegraph Noise Instability and Its Scaling in Deca–Nanometer Flash Memories , 2009, IEEE Transactions on Electron Devices.

[71]  Jae-Duk Lee,et al.  Effects of floating-gate interference on NAND flash memory cell operation , 2002, IEEE Electron Device Letters.

[72]  Jihong Kim,et al.  A read-disturb management technique for high-density NAND flash memory , 2013, APSys.

[73]  Kevin K. Chang,et al.  Understanding and Improving the Latency of DRAM-Based Memory Systems , 2017, ArXiv.

[74]  Tong Zhang,et al.  Exploiting Memory Device Wear-Out Dynamics to Improve NAND Flash Memory System Performance , 2011, FAST.

[75]  Xavier Jimenez,et al.  Wear unleveling: improving NAND flash lifetime by balancing page endurance , 2014, FAST.

[76]  Onur Mutlu,et al.  The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[77]  R. Degraeve,et al.  Analytical percolation model for predicting anomalous charge loss in flash memories , 2004, IEEE Transactions on Electron Devices.

[78]  Mahmut T. Kandemir,et al.  ZombieNAND: Resurrecting Dead NAND Flash for Improved SSD Longevity , 2014, 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems.

[79]  Yoon-Hee Choi,et al.  Three-Dimensional 128 Gb MLC Vertical nand Flash Memory With 24-WL Stacked Layers and 50 MB/s High-Speed Programming , 2014, IEEE Journal of Solid-State Circuits.

[80]  Young-Ho Lim,et al.  A 3.3 V 32 Mb NAND flash memory with incremental step pulse programming scheme , 1995 .

[81]  Tong Zhang,et al.  Quasi-nonvolatile SSD: Trading flash memory nonvolatility to improve storage system performance for enterprise applications , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[82]  Philipp Nadel,et al.  Beginning Linux Programming , 2016 .

[83]  Onur Mutlu,et al.  The RowHammer problem and other issues we may face as memory becomes denser , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[84]  Suman Nath,et al.  FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs , 2017, FAST.

[85]  Il Han Park,et al.  A 512-Gb 3-b/Cell 64-Stacked WL 3-D-NAND Flash Memory , 2018, IEEE Journal of Solid-State Circuits.

[86]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.