Data Block Partitioning Methods to Mitigate Stuck-At Faults in Limited Endurance Memories

Deep scaling in conjunction with increased process variation has resulted in increasingly faulty memories. Emerging memories, particularly phase-change and resistive memories, can experience stuck-at faults due to limited endurance. Partition and flip (PAF) schemes partition data into blocks and invert these blocks as needed to ensure data that is written matches the stuck-at cells. In this paper, we propose two novel correction schemes that substantially enhance the fault-tolerance capabilities of existing PAF techniques. First, dynamic partitioning increases the number of possible configurations with equivalent auxiliary bits. At high fixed error rates, the increase in configurations results in improved write error rates for flip-N-write and Aegis partitioning by 7%–72% and 5– $53\times $ , respectively. Our second novel partitioning method, relaxed partitioning, dramatically and effectively increases the partitioning search space by specifying minimally overlapping configurations. Through Monte Carlo simulations, data-aware dynamic partitioning tolerates 25% and 27% more faults over its lifetime than Aegis with 36 and 43 auxiliary bits per 512-bit data block, respectively, while relaxed partitioning achieves an extra 15% and 24% additional improvement while requiring two fewer overhead bits per data block.

[1]  Shuming Chen,et al.  nMOS Transistor Location Adjustment for N-Hit Single-Event Transient Mitigation in 65-nm CMOS Bulk Technology , 2018, IEEE Transactions on Nuclear Science.

[2]  Rami G. Melhem,et al.  RDIS: A recursively defined invertible set scheme to tolerate multiple stuck-at faults in resistive memory , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[3]  Alex K. Jones,et al.  RETROFIT: Fault-Aware Wear Leveling , 2018, IEEE Computer Architecture Letters.

[4]  Jiwu Shu,et al.  Aegis: Partitioning data block for efficient recovery of stuck-at-faults in phase change memory , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Rami G. Melhem,et al.  Improving Bit Flip Reduction for Biased and Random Data , 2016, IEEE Transactions on Computers.

[6]  Pengcheng Huang,et al.  Recoil-Ion-Induced Single Event Upsets in Nanometer CMOS SRAM Under Low-Energy Proton Radiation , 2017, IEEE Transactions on Nuclear Science.

[7]  Guangyu Sun,et al.  SPMS: Strand based persistent memory system , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[8]  A. Robert Calderbank,et al.  Coset coding to extend the lifetime of memory , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[9]  Kinam Kim,et al.  Technology for sub-50nm DRAM and NAND flash manufacturing , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[10]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Nuo Xu,et al.  A generalized model of TiO x -based memristive devices and its application for image processing , 2017 .

[12]  Karin Strauss,et al.  Use ECP, not ECC, for hard failures in resistive memories , 2010, ISCA.

[13]  Yiran Chen,et al.  Emerging non-volatile memories: Opportunities and challenges , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[14]  Rami G. Melhem,et al.  Dynamic partitioning to mitigate stuck-at faults in emerging memories , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[15]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[16]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  许诺,et al.  A generalized model of TiOx-based memristive devices and its application for image processing , 2017 .

[18]  Zaid Al-Ars DRAM fault analysis and test generation , 2005 .

[19]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Tao Yuan,et al.  Yield Prediction for Integrated Circuits Manufacturing Through Hierarchical Bayesian Modeling of Spatial Defects , 2011, IEEE Transactions on Reliability.

[21]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[22]  Hsien-Hsin S. Lee,et al.  SAFER: Stuck-At-Fault Error Recovery for Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[23]  Dae-Hyun Kim,et al.  ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates , 2013, ISCA.

[24]  Sivan Toledo,et al.  Phase-change memory: An architectural perspective , 2013, CSUR.

[25]  Rami G. Melhem,et al.  Yoda: Judge Me by My Size, Do You? , 2017, 2017 IEEE International Conference on Computer Design (ICCD).