Combinational Logic Circuit Protection Using Customized Error Detecting and Correcting Codes

Detecting and correcting errors in logic circuits is much more difficult than in memories. While concurrent error detection and correction mechanisms can be efficiently incorporated in memories due to their regular structure, logic circuits present a much greater challenge because of their irregular structure. One approach to handle the problems arising due to soft errors is to detect the errors using a concurrent error detection (CED) circuitry that monitors the circuit output for the occurrence of an error. Once the error is detected the system can recover and hence prevent a failure. While operating in an environment with high soft error rate and for systems with a stringent reliability and availability requirement, error detection alone may not be sufficient. While triple modular redundancy (TMR) can mask all single faults, the overhead can be unacceptably high for the targeted applications. This paper presents a low-overhead non-intrusive technique to detect and correct the most likely soft errors using customized ad-hoc error detecting and correcting (EDAC) linear block codes. Employing the proposed EDAC scheme can dramatically reduce the failure rate and increase the mean time to failure (MTTF) for logic circuits with limited overhead. For certain types of applications e.g., network servers, query servers, etc., with high availability and low cost requirements, the proposed approach could be very useful. The linearity property of the codes allows for efficient synthesis of the parity prediction logic. The experimental results demonstrate the effectiveness of the proposed scheme.

[1]  Eiji Fujiwara,et al.  A Self-Checking Generalized Prediction Checker and Its Use for Built-In Testing , 1987, IEEE Transactions on Computers.

[2]  Nur A. Touba,et al.  Synthesis of low-cost parity-based partially self-checking circuits , 2003, 9th IEEE On-Line Testing Symposium, 2003. IOLTS 2003..

[3]  Michael Nicolaidis Time redundancy based soft-error tolerance to rescue nanometer technologies , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[4]  Nikolaos Gaitanis The Design of TSC Error C/D Circuits for SEC/DED Codes , 1988, IEEE Trans. Computers.

[5]  Cecilia Metra,et al.  On-line detection of logic errors due to crosstalk, delay, and transient faults , 1998, Proceedings International Test Conference 1998 (IEEE Cat. No.98CH36270).

[6]  Nur A. Touba,et al.  Logic synthesis of multilevel circuits with concurrent error detection , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[7]  Yervant Zorian,et al.  On-Line Testing for VLSI—A Compendium of Approaches , 1998, J. Electron. Test..

[8]  Edward J. McCluskey,et al.  On-line delay testing of digital circuits , 1994, Proceedings of IEEE VLSI Test Symposium.

[9]  Niraj K. Jha,et al.  Design and synthesis of self-checking VLSI circuits , 1993, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[10]  Donatella Sciuto,et al.  A novel methodology for designing TSC networks based on the parity bit code , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[11]  N. Cohen,et al.  Soft error considerations for deep-submicron CMOS circuit applications , 1999, International Electron Devices Meeting 1999. Technical Digest (Cat. No.99CH36318).

[12]  Yiorgos Makris,et al.  Concurrent error detection for combinational and sequential logic via output compaction , 2004, International Symposium on Signals, Circuits and Systems. Proceedings, SCS 2003. (Cat. No.03EX720).

[13]  Cecilia Metra,et al.  Online testing approach for very deep-submicron ICs , 2002, IEEE Design & Test of Computers.

[14]  James L. Walsh,et al.  IBM experiments in soft fails in computer electronics (1978-1994) , 1996, IBM J. Res. Dev..

[15]  Jien-Chung Lo Single fault masking logic designs with error correcting codes , 1995, Proceedings of International Workshop on Defect and Fault Tolerance in VLSI.

[16]  Prithviraj Banerjee,et al.  RSYN: a system for automated synthesis of reliable multilevel circuits , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[17]  S. Tarnick Bounding error masking in linear output space compression schemes , 1994, Proceedings of IEEE 3rd Asian Test Symposium (ATS).

[18]  Yiorgos Makris,et al.  Cost-driven selection of parity trees , 2004, 22nd IEEE VLSI Test Symposium, 2004. Proceedings..

[19]  Ming Zhang,et al.  Soft Error Resilient System Design through Error Correction , 2006, VLSI-SoC.