Reconfigurable ECC for adaptive protection of memory

Post-silicon healing techniques that rely on built-in redundancy (e.g. row/column redundancy) remain effective in healing manufacturing defects and process variation induced failures in nanoscale memory. They are, however, not effective in improving robustness under various run-time failures. Increasing run-time failures in memory, specifically in case of low-voltage low-power memory, has emerged as a major design challenge. Traditionally, a uniform worst-case protection using Error Correction Code (ECC) is used for all blocks in a large memory array for runt-time error resiliency. However, with both spatial and temporal shift in intrinsic reliability of a memory block, such uniform protection can be unattractive in terms of either ECC overhead or protection level. We propose a novel Reconfigurable ECC approach, which can adapt, in space and time, to varying reliability of memory blocks by using an ECC that can provide the right amount of protection for a memory block at a given time. We show that such an approach is extremely effective in diverse applications.

[1]  Wei Wu,et al.  Improving cache lifetime reliability at ultra-low voltages , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Abhijit Chatterjee,et al.  RF substrates yield improvement using package-chip co-design and on-chip calibration , 2010, 2010 IEEE Electrical Design of Advanced Package & Systems Symposium.

[3]  Parag K. Lala,et al.  An Architecture for Self-Healing Digital Systems , 2003, J. Electron. Test..

[4]  Christos A. Papachristou,et al.  System level self-healing for parametric yield and reliability improvement under power bound , 2010, 2010 NASA/ESA Conference on Adaptive Hardware and Systems.

[5]  Wei Liu,et al.  Low-Power High-Throughput BCH Error Correction VLSI Design for Multi-Level Cell NAND Flash Memories , 2006, 2006 IEEE Workshop on Signal Processing Systems Design and Implementation.

[6]  Swarup Bhunia,et al.  Healing of DSP Circuits Under Power Bound Using Post-Silicon Operand Bitwidth Truncation , 2012, IEEE Transactions on Circuits and Systems I: Regular Papers.

[7]  Nhon Quach,et al.  High Availability and Reliability in the Itanium Processor , 2000, IEEE Micro.

[8]  Tryggve Fossum,et al.  Cache scrubbing in microprocessors: myth or necessity? , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[9]  J. Maiz,et al.  Characterization of multi-bit soft error events in advanced SRAMs , 2003, IEEE International Electron Devices Meeting 2003.

[10]  Shu Lin,et al.  Error Control Coding , 2004 .

[11]  Abhijit Chatterjee,et al.  Self-correcting, self-testing circuits and systems for post-manufacturing yield improvement , 2011, 2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS).

[12]  Kaushik Roy,et al.  Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Swarup Bhunia,et al.  Reliability-Driven ECC Allocation for Multiple Bit Error Resilience in Processor Cache , 2011, IEEE Transactions on Computers.

[14]  Kaushik Roy,et al.  NBTI induced performance degradation in logic and memory circuits: how effectively can we approach a reliability solution? , 2008, 2008 Asia and South Pacific Design Automation Conference.

[15]  Kaushik Roy,et al.  Self-healing design in deep scaled CMOS technologies , 2011 .