RePRAM: Re-cycling PRAM faulty blocks for extended lifetime

As main memory systems begin to face the scaling challenges from DRAM technology, future computer systems need to adapt to the emerging memory technologies like Phase-Change Memory (PCM or PRAM). While these newer technologies offer advantages such as storage density, non-volatility, and low energy consumption, they are constrained by limited write endurance that becomes more pronounced with process variation. In this paper, we propose a novel PRAM-based main memory system, RePRAM (Recycling PRAM), which leverages a group of faulty pages and recycles them in a managed way to significantly extend the PRAM lifetime while minimizing the performance impact. In particular, we explore two different dimensions of dynamic redundancy levels and group sizes, and design low-cost hardware and software support for RePRAM. Our proposed scheme involves minimal hardware modifications (that have less than 1% on-chip and off-chip area overheads). Also, our schemes can improve the PRAM lifetime by up to 43× (times) over a chip with no error correction capabilities, and outperform prior schemes such as DRM and ECP at a small fraction of the hardware cost. The performance overhead resulting from our scheme is less than 7% on average across 21 applications from SPEC2006, Splash-2, and PARSEC benchmark suites.

[1]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[2]  Randy H. Katz RAID: A Personal Recollection of How Storage Became a System , 2010, IEEE Ann. Hist. Comput..

[3]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  H. Howie Huang,et al.  Energy-aware writes to non-volatile main memory , 2011, OPSR.

[5]  Engin Ipek,et al.  Dynamically replicated memory: building reliable systems from nanoscale resistive memories , 2010, ASPLOS XV.

[6]  Seung-Yun Lee,et al.  A Low Power Phase-Change Random Access Memory using a Data-Comparison Write Scheme , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[7]  YangJun,et al.  A durable and energy efficient main memory using phase change memory technology , 2009 .

[8]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[9]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[10]  Shih-Hung Chen,et al.  Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..

[11]  Sean Eilert,et al.  Phase Change Memory: A New Memory Enables New Memory Usage Models , 2009, 2009 IEEE International Memory Workshop.

[12]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[13]  Jun Yang,et al.  LLS: Cooperative integration of wear-leveling and salvaging for PCM main memory , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[14]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15]  Tao Li,et al.  Characterizing and mitigating the impact of process variations on phase change based memory systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  Karin Strauss,et al.  Use ECP, not ECC, for hard failures in resistive memories , 2010, ISCA.

[17]  Rami G. Melhem,et al.  RDIS: A recursively defined invertible set scheme to tolerate multiple stuck-at faults in resistive memory , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[18]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[19]  H. Howie Huang,et al.  rPRAM: Exploring Redundancy Techniques to Improve Lifetime of PCM-based Main Memory , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[20]  Yuanyuan Zhou,et al.  SafeMem: exploiting ECC-memory for detecting memory leaks and memory corruption during production runs , 2005, 11th International Symposium on High-Performance Computer Architecture.

[21]  Nikolai Joukov,et al.  RAIF: Redundant Array of Independent Filesystems , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[22]  Wei Wu,et al.  Reducing cache power with low-cost, multi-bit error-correcting codes , 2010, ISCA.

[23]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[24]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Hsien-Hsin S. Lee,et al.  Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping , 2010, ISCA.

[26]  H KatzRandy,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988 .

[27]  Norman P. Jouppi,et al.  FREE-p: Protecting non-volatile memory against both hard and soft errors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[28]  Hsien-Hsin S. Lee,et al.  SAFER: Stuck-At-Fault Error Recovery for Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.