Data Block Partitioning for Recovering Stuck-at Faults in PCMs

Main burdens to the DRAM scalability are leakage and charge storage restrictions. Phase Change Memory (PCM) is being known as a promising candidate for the replacement of DRAM among competitive non-volatile memories. However, this memory suffers from low cell reliability due to limited write endurance. This problem can lead to some memory cells permanently stuck at either '0' or '1'. Therefore, a robust error recovery scheme is needed to overcome this problem and recover from hard errors. State-of-the-art solutions apply error correction and recovery techniques at inter- line or intra-line level. Precisely, they can improve PCM endurance either by remapping failed lines to spares (in inter-line level schemes) or by using data-block partitioning and bit- inversion scheme (in intra-line level schemes). Although techniques of the latter type are effective, proper partitioning of data blocks and spreading out faults across different groups are required. In this paper, we propose and evaluate a novel intra-line level scheme that statically partition a data-block into some groups and efficiently recover multi-bit stuck-at faults per partition. This method benefits from the advantage of a simple shifting mechanism in order to increase the chance of storing data in presence of failed cells. Evaluation results for multi- threaded workloads show enhancement in the number of recoverable failures and improvement of lifetime over existing techniques.

[1]  Karin Strauss,et al.  Zombie memory: Extending memory lifetime by reviving dead blocks , 2013, ISCA.

[2]  Yuan Xie,et al.  WADE: Writeback-aware dynamic cache management for NVM-based main memory system , 2013, TACO.

[3]  Hamid Sarbazi-Azad,et al.  BLESS: A simple and efficient scheme for prolonging PCM lifetime , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[4]  Christoforos E. Kozyrakis,et al.  Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[5]  Yuan Xie,et al.  i2WAP: Improving non-volatile cache lifetime by reducing inter- and intra-set write variations , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[6]  Rami G. Melhem,et al.  CAFO: Cost aware flip optimization for asymmetric memories , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[7]  Jiwu Shu,et al.  Aegis: Partitioning data block for efficient recovery of stuck-at-faults in phase change memory , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Moinuddin K. Qureshi,et al.  Morphable memory system: a robust architecture for exploiting multi-level phase change memories , 2010, ISCA.

[9]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[10]  Mohammad Arjomand,et al.  Prolonging Lifetime of PCM-Based Main Memories through On-Demand Page Pairing , 2015, TODE.

[11]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Engin Ipek,et al.  Dynamically replicated memory: building reliable systems from nanoscale resistive memories , 2010, ASPLOS XV.

[13]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[14]  Hsien-Hsin S. Lee,et al.  SAFER: Stuck-At-Fault Error Recovery for Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[16]  Moinuddin K. Qureshi Pay-As-You-Go: Low-overhead hard-error correction for phase change memories , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[19]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[20]  Karin Strauss,et al.  Use ECP, not ECC, for hard failures in resistive memories , 2010, ISCA.