Memory Reliability Improvement Based on Maximized Error-Correcting Codes

Error-correcting codes (ECC) offer an efficient way to improve the reliability and yield of memory subsystems. ECC-based protection is usually provided on a memory word basis such that the number of data-bits in a codeword corresponds to the amount of information that can be transferred during a single memory access operation. Consequently, the codeword length is not the maximum allowed by a certain check-bit number since the number of data-bits is constrained by the width of the memory data interface. This work investigates the additional error correction opportunities offered by the absence of a perfect match between the numbers of data-bits and check-bits in some widespread ECCs. A method is proposed for the selection of multi-bit errors that can be additionally corrected with a minimal impact on ECC decoder latency. These methods were applied to single-bit error correction (SEC) codes and double-bit error correction (DEC) codes. Reliability improvements are evaluated for memories in which all errors affecting the same number of bits in a codeword are independent and identically distributed. It is shown that the application of the proposed methods to conventional DEC codes can improve the mean-time-to-failure (MTTF) of memories with up to 30 %. Maximized versions of the DEC codes are also proposed in which all adjacent triple-bit errors become correctable without affecting the maximum number of triple-bit errors that can be made correctable.

[1]  Valentin Gherman,et al.  Generalized parity-check matrices for SEC-DED codes with fixed parity , 2011, 2011 IEEE 17th International On-Line Testing Symposium.

[2]  Cheng-Wen Wu,et al.  Testing MRAM for Write Disturbance Fault , 2006, 2006 IEEE International Test Conference.

[3]  Elwyn R. Berlekamp,et al.  Algebraic coding theory , 1984, McGraw-Hill series in systems science.

[4]  C. L. Chen,et al.  APPENDIX A – Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1992 .

[5]  Shu Lin,et al.  Error control coding : fundamentals and applications , 1983 .

[6]  J. Z. Sun,et al.  Switching speed distribution of spin-torque-induced magnetic reversal , 2007 .

[7]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[8]  Jeffrey T. Draper,et al.  DEC ECC design to improve memory reliability in Sub-100nm technologies , 2008, 2008 15th IEEE International Conference on Electronics, Circuits and Systems.

[9]  Lorena Anghel,et al.  A diversified memory built-in self-repair approach for nanotechnologies , 2004, 22nd IEEE VLSI Test Symposium, 2004. Proceedings..

[10]  Wei Wu,et al.  Improving cache lifetime reliability at ultra-low voltages , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  T. M. Mak,et al.  Do we need anything more than single bit error correction (ECC) , 2004 .

[12]  T. M. Mak,et al.  Do we need anything more than single bit error correction (ECC)? , 2004, Records of the 2004 International Workshop on Memory Technology, Design and Testing, 2004..

[13]  Chin-Long Chen,et al.  Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1984, IBM J. Res. Dev..

[14]  Benoît Godard,et al.  Hierarchical Code Correction and Reliability Management in Embedded nor Flash Memories , 2008, 2008 13th European Test Symposium.

[15]  Timothy J. Dell,et al.  A white paper on the benefits of chipkill-correct ecc for pc server main memory , 1997 .

[16]  Kwang-Ting Cheng,et al.  Error-locality-aware linear coding to correct multi-bit upsets in SRAMs , 2010, 2010 IEEE International Test Conference.

[17]  Edward J. McCluskey,et al.  Software-implemented EDAC protection against SEUs , 2000, IEEE Trans. Reliab..