Efficient scrub mechanisms for error-prone emerging memories

Many memory cell technologies are being considered as possible replacements for DRAM and Flash technologies, both of which are nearing their scaling limits. While these new cells (PCM, STT-RAM, FeRAM, etc.) promise high density, better scaling, and non-volatility, they introduce new challenges. Solutions at the architecture level can help address some of these problems; e.g., prior research has proposed wear-leveling and hard error tolerance mechanisms to overcome the limited write endurance of PCM cells. In this paper, we focus on the soft error problem in PCM, a topic that has received little attention in the architecture community. Soft errors in DRAM memories are typically addressed by having SECDED support and a scrub mechanism. The scrub mechanism scans the memory looking for a single-bit error and corrects it before the line experiences a second uncorrectable error. However, PCM (and other emerging memories) are prone to new sources of soft errors. In particular, multi-level cell (MLC) PCM devices will suffer from resistance drift, that increases the soft error rate and incurs high overheads for the scrub mechanism. This paper is the first to study the design of architectural scrub mechanisms, especially when tailored to the drift phenomenon in MLC PCM. Many of our solutions will also apply to other soft-error prone emerging memories. We first show that scrub overheads can be reduced with support for strong ECC codes and a lightweight error detection operation. We then design different scrub algorithms that can adaptively trade-off soft and hard errors. Using an approach that combines all proposed solutions, our scrub mechanism yields a 96.5% reduction in uncorrectable errors, a 24.4 × decrease in scrub-related writes, and a 37.8% reduction in scrub energy, relative to a basic scrub algorithm used in modern DRAM systems.

[1]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[2]  S. Simmons,et al.  A study on the VLSI implementation of ECC for embedded DRAM , 2003, CCECE 2003 - Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No.03CH37436).

[3]  Norman P. Jouppi,et al.  Rethinking DRAM design and organization for energy-constrained multi-cores , 2010, ISCA.

[4]  Wei Xu,et al.  A Time-Aware Fault Tolerance Scheme to Improve Reliability of Multilevel Phase-Change Memory in the Presence of Significant Resistance Drift , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[6]  Wei Xu,et al.  Using time-aware memory sensing to address resistance drift issue in multi-level phase change memory , 2010, 2010 11th International Symposium on Quality Electronic Design (ISQED).

[7]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[8]  Rainer Leupers,et al.  Synchronization for hybrid MPSoC full-system simulation , 2012, DAC Design Automation Conference 2012.

[9]  Jichuan Chang,et al.  Totally green: evaluating and designing servers for lifecycle environmental impact , 2012, ASPLOS XVII.

[10]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  A. Pirovano,et al.  Statistical and scaling behavior of structural relaxation effects in phase-change memory (PCM) devices , 2009, 2009 IEEE International Reliability Physics Symposium.

[12]  R. Delhougne,et al.  Degradation of the Reset Switching During Endurance Testing of a Phase-Change Line Cell , 2009, IEEE Transactions on Electron Devices.

[13]  A. Pirovano,et al.  Low-field amorphous state resistance and threshold voltage drift in chalcogenide materials , 2004, IEEE Transactions on Electron Devices.

[14]  Moinuddin K. Qureshi,et al.  Morphable memory system: a robust architecture for exploiting multi-level phase change memories , 2010, ISCA.

[15]  M. Breitwisch,et al.  Multilevel Phase-Change Memory Modeling and Experimental Characterization , 2009 .

[16]  Swarup Bhunia,et al.  Reliability-Driven ECC Allocation for Multiple Bit Error Resilience in Processor Cache , 2011, IEEE Transactions on Computers.

[17]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[18]  Y.N. Hwang,et al.  MLC PRAM with SLC write-speed and robust read scheme , 2010, 2010 Symposium on VLSI Technology.

[19]  Norman P. Jouppi,et al.  Combining memory and a controller with photonics through 3D-stacking to enable scalable and energy-efficient systems , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[20]  A. Pirovano,et al.  Numerical Implementation of Low Field Resistance Drift for Phase Change Memory Simulations , 2008, 2008 Joint Non-Volatile Semiconductor Memory Workshop and International Conference on Memory Technology and Design.

[21]  K. Gopalakrishnan,et al.  Phase change memory technology , 2010, 1001.1164.

[22]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[23]  L. Goux,et al.  Transient Characteristics of the Reset Programming of a Phase-Change Line Cell and the Effect of the Reset Parameters on the Obtained State , 2009, IEEE Transactions on Electron Devices.

[24]  D. Ielmini,et al.  Reliability Impact of Chalcogenide-Structure Relaxation in Phase-Change Memory (PCM) Cells—Part I: Experimental Study , 2009, IEEE Transactions on Electron Devices.

[25]  Norman P. Jouppi,et al.  FREE-p: Protecting non-volatile memory against both hard and soft errors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[26]  Hsien-Hsin S. Lee,et al.  SAFER: Stuck-At-Fault Error Recovery for Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[27]  Wei Wu,et al.  Reducing cache power with low-cost, multi-bit error-correcting codes , 2010, ISCA.

[28]  Eduardo Pinheiro,et al.  DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.

[29]  Miodrag Potkonjak,et al.  Coding-based energy minimization for Phase Change Memory , 2012, DAC Design Automation Conference 2012.

[30]  Karin Strauss,et al.  Use ECP, not ECC, for hard failures in resistive memories , 2010, ISCA.

[31]  E. Eleftheriou,et al.  Drift-Tolerant Multilevel Phase-Change Memory , 2011, 2011 3rd IEEE International Memory Workshop (IMW).

[32]  Wei Xu,et al.  Data manipulation techniques to reduce phase change memory write energy , 2009, ISLPED.

[33]  Tao Li,et al.  Helmet: A resistance drift resilient architecture for multi-level cell phase change memory system , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[34]  Byung-Gil Choi,et al.  A 0.1-$\mu{\hbox {m}}$ 1.8-V 256-Mb Phase-Change Random Access Memory (PRAM) With 66-MHz Synchronous Burst-Read Operation , 2007, IEEE Journal of Solid-State Circuits.

[35]  Yi-Bo Liao,et al.  Operation of multi-level phase change memory using various programming techniques , 2009, 2009 IEEE International Conference on IC Design and Technology.

[36]  Moinuddin K. Qureshi,et al.  Improving read performance of Phase Change Memories via Write Cancellation and Write Pausing , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[37]  Doe Hyun Yoon,et al.  Virtualized and flexible ECC for main memory , 2010, ASPLOS XV.

[38]  T. Schloesser,et al.  Challenges for the DRAM cell scaling to 40nm , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[39]  Engin Ipek,et al.  Dynamically replicated memory: building reliable systems from nanoscale resistive memories , 2010, ASPLOS XV.

[40]  S. Kostylev DRIFT OF PROGRAMMED RESISTANCE IN ELECTRICAL PHASE CHANGE MEMORY DEVICES , 2008 .

[41]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[42]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[43]  Y.C. Chen,et al.  Write Strategies for 2 and 4-bit Multi-Level Phase-Change Memory , 2007, 2007 IEEE International Electron Devices Meeting.

[44]  Ming-Jinn Tsai,et al.  Design optimization in write speed of multi-level cell application for phase change memory , 2009, 2009 IEEE International Conference of Electron Devices and Solid-State Circuits (EDSSC).

[45]  Franco Fummi,et al.  SAGA: SystemC acceleration on GPU architectures , 2012, DAC Design Automation Conference 2012.

[46]  Orion Hodson,et al.  Whole-system persistence , 2012, ASPLOS XVII.

[47]  Andrea L. Lacaita,et al.  Unified mechanisms for structural relaxation and crystallization in phase-change memory devices , 2009 .

[48]  Tei-Wei Kuo,et al.  Age-based PCM wear leveling with nearly zero search cost , 2012, DAC Design Automation Conference 2012.

[49]  Stefan K. Lai,et al.  Flash memories: Successes and challenges , 2008, IBM J. Res. Dev..

[50]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).