Software detection mechanisms providing full coverage against single bit-flip faults

Increasing design complexity for current and future generations of microelectronic technologies leads to an increased sensitivity to transient bit-flip errors. These errors can cause unpredictable behaviors and corrupt data integrity and system availability. This work proposes new solutions to detect all classes of faults, including those that escape conventional software detection mechanisms, allowing full protection against transient bit-flip errors. The proposed solutions, particularly well suited for low-cost safety-critical microprocessor-based applications, have been validated through exhaustive fault injection experiments performed on a set of real and synthetic benchmark programs. The fault model taken into consideration was single bit-flip errors corrupting memory cells accessible to the user by means of the processor instruction set. The obtained results demonstrate the effectiveness of the proposed solutions.

[1]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[2]  Janak H. Patel,et al.  Concurrent Error Detection in ALU's by Recomputing with Shifted Operands , 1982, IEEE Transactions on Computers.

[3]  Prithviraj Banerjee,et al.  Low Cost Concurrent Error Detection in a VLIW Architecture Using Replicated Instructions , 1992, ICPP.

[4]  Marco Torchiano,et al.  Soft-error detection through software fault-tolerance techniques , 1999, Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT'99).

[5]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..

[6]  Henrique Madeira,et al.  On-Line Signature Learning and Checking , 1992 .

[7]  Edward J. McCluskey,et al.  Control-Flow Checking Using Watchdog Assists and Extended-Precision Checksums , 1990, IEEE Trans. Computers.

[8]  Edward J. McCluskey,et al.  Center for Reliable Computing TECHNICAL REPORT ED 4 I : Error Detection by Diverse Data and Duplicated Instructions , 2001 .

[9]  Masood Namjoo,et al.  Techniques for Concurrent Testing of VLSI Processor Operation , 1982, ITC.

[10]  John Paul Shen,et al.  On-Line Self-Monitoring Using Signatured Instruction Streams , 1983, International Test Conference.

[11]  S. Rezgui,et al.  Predicting error rate for microprocessor-based digital architectures through C.E.U. (Code Emulating Upsets) injection , 2000 .

[12]  Stephen S. Yau,et al.  An Approach to Concurrent Control Flow Checking , 1980, IEEE Transactions on Software Engineering.

[13]  David T. Brown Error Detecting and Correcting Binary Codes for Arithmetic Operations , 1960, IRE Trans. Electron. Comput..

[14]  T. P. Ma,et al.  Ionizing radiation effects in MOS devices and circuits , 1989 .

[15]  Yvon Savaria,et al.  Reducing fault sensitivity of microprocessor-based systems by modifying workload structure , 1998, Proceedings 1998 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (Cat. No.98EX223).

[16]  Heidrun Engel Data Flow Transformations to Detect Results Hardware Faults , 1997 .

[17]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[18]  Massimo Violante,et al.  Soft-error detection using control flow assertions , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[19]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[20]  Raoul Velazco,et al.  Estimating error rates in processor-based architectures , 2000 .

[21]  Johan Karlsson,et al.  Two software techniques for on-line error detection , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[22]  Y. Savaria,et al.  SIED: software implemented error detection , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[23]  S. Rezgui,et al.  Validation of an SEU simulation technique for a complex processor: PowerPC7400 , 2002 .