Exploring the Potential for Collaborative Data Compression and Hard-Error Tolerance in PCM Memories

Limited write endurance is the main obstacle standing in the way of using phase change memory (PCM) in future computing systems. While several wear-leveling and hard-error tolerant techniques have been proposed for improving PCM lifetime, most of these approaches assume that the underlying memory uses a very simple write traffic reduction scheme (e.g., buffering, differential writes). In particular, most PCM prototypes/chips are equipped with an embedded circuit to support differential writes (DW) – on a write, only the bits that differ between the old and new data are updated. With DW, the bit-pattern of updates in a memory block is usually random, which limits the opportunity to exploit the resulting bit pattern for lifetime enhancement at an architecture level (e.g., using techniques such as wear-leveling and hard-error tolerance). This paper focuses on this inefficiency and proposes a solution based on data compression. Employing compression can improve the lifetime of the PCM memory. Using state-of-the-art compression schemes, the size of the compressed data is usually much smaller than the original data written back to memory from the last-level cache on an eviction. By storing data in a compressed format in the target memory block, first, we limit the number of bit flips to fewer memory cells, enabling more efficient intra-line wear-leveling and error recovery, and second, the unused bits in the memory block can be reused as replacements for faulty bits given the reduced size of the (compressed) data. It can also happen that for a portion of the memory blocks, the resulting compressed data is not very small. This can be due to increased data entropy introduced by compression, where the total number of bit flips will be increased over the baseline system. In this paper, we present an approach that provides collaborative operation of data compression, differential writes, wear-leveling and hard-error tolerant techniques targeting PCM memories. We propose approaches that reap the maximum benefits from compression, while also enjoying the benefits of techniques that reduce the number of high-entropy writes. Using an approach that combines different solutions, our mechanism tolerates 2.9× more cell failures per memory line and achieves a 4.3× increase in PCM memory lifetime, relative to our baseline state-of-the-art PCM DIMM memory.

[1]  Kinarn Kim,et al.  Reliability investigations for manufacturable high density PRAM , 2005, 2005 IEEE International Reliability Physics Symposium, 2005. Proceedings. 43rd Annual..

[2]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Sungjoo Hong,et al.  Memory technology trend and future challenges , 2010, 2010 International Electron Devices Meeting.

[4]  Eduardo Pinheiro,et al.  DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.

[5]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[6]  Kartik Mohanram,et al.  CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM , 2016, HPCA.

[7]  Tao Li,et al.  Characterizing and mitigating the impact of process variations on phase change based memory systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Y.J. Song,et al.  Two-bit cell operation in diode-switch phase change memory cells with 90nm technology , 2008, 2008 Symposium on VLSI Technology.

[9]  D. Ielmini,et al.  Reliability study of phase-change nonvolatile memories , 2004, IEEE Transactions on Device and Materials Reliability.

[10]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[11]  Jun Yang,et al.  Frequent value compression in data caches , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[12]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[13]  Norman P. Jouppi,et al.  FREE-p: Protecting non-volatile memory against both hard and soft errors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[14]  Hsien-Hsin S. Lee,et al.  SAFER: Stuck-At-Fault Error Recovery for Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[16]  David A. Wood,et al.  Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[17]  Karin Strauss,et al.  Use ECP, not ECC, for hard failures in resistive memories , 2010, ISCA.

[18]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[19]  Mattan Erez,et al.  RelaxFault Memory Repair , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[20]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Onur Mutlu,et al.  Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[22]  Cloyce D. Spradling SPEC CPU2006 benchmark tools , 2007, CARN.

[23]  Jiwu Shu,et al.  Aegis: Partitioning data block for efficient recovery of stuck-at-faults in phase change memory , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Per Stenström,et al.  SC2: A statistical compression cache scheme , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[25]  Hsien-Hsin S. Lee,et al.  Tri-level-cell phase change memory: toward an efficient and reliable memory system , 2013, ISCA.

[26]  M. Welbourne More on Moore , 1992 .