A Fault Tolerant Approach to Detect Transient Faults in Microprocessors Based on a Non-Intrusive Reconfigurable Hardware

This paper presents a non-intrusive hybrid fault detection approach that combines hardware and software techniques to detect transient faults in microprocessors. Such faults have a major influence in microprocessor-based systems, affecting both data and control flow. In order to protect the system, an application-oriented hardware module is automatically generated and reconfigured on the system during runtime. When combined with fault tolerance techniques based on software, this solution offers full system protection against transient faults. A fault injection campaign is performed using a MIPS microprocessor executing a set of applications. HW/SW implementation in a reprogrammable platform shows smaller memory area and execution time overhead when compared to related works. Fault injection results show the efficiency of this method by detecting 100% of faults.

[1]  F. W. Sexton,et al.  Destructive single-event effects in semiconductor devices and ICs , 2003 .

[2]  Jacob A. Abraham,et al.  CEDA: Control-Flow Error Detection Using Assertions , 2011, IEEE Transactions on Computers.

[3]  M. Rimen,et al.  Implicit signature checking , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[4]  Jacob A. Abraham,et al.  CEDA: control-flow error detection through assertions , 2006, 12th IEEE International On-Line Testing Symposium (IOLTS'06).

[5]  David J. Lu Watchdog Processors and Structural Integrity Checking , 1982, IEEE Transactions on Computers.

[6]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[7]  Marco Torchiano,et al.  Soft-error detection through software fault-tolerance techniques , 1999, Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT'99).

[8]  M. Namjoo,et al.  WATCHDOG PROCESSORS AND CAPABILITY CHECKING , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[9]  Luigi Carro,et al.  Hardware and Software Transparency in the Protection of Programs Against SEUs and SETs , 2008, J. Electron. Test..

[10]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..

[11]  John Paul Shen,et al.  Continuous signature monitoring: low-cost concurrent detection of processor control errors , 1990, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[12]  Donatella Sciuto,et al.  A model of soft error effects in generic IP processors , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[13]  John Paul Shen,et al.  Processor Control Flow Monitoring Using Signatured Instruction Streams , 1987, IEEE Transactions on Computers.

[14]  Donatella Sciuto,et al.  Reliable system specification for self-checking data-paths , 2005, Design, Automation and Test in Europe.

[15]  F. R. Palomo,et al.  A Novel Co-Design Approach for Soft Errors Mitigation in Embedded Systems , 2011, IEEE Transactions on Nuclear Science.

[16]  Lloyd W. Massengill,et al.  Basic mechanisms and modeling of single-event upset in digital microelectronics , 2003 .

[17]  Edward J. McCluskey,et al.  Concurrent Fault Detection Using a Watchdog Processor and Assertions , 1983, ITC.

[18]  Massimo Violante,et al.  A new approach to cope with single event upsets in processor-based systems , 2006 .

[19]  Fabian Vargas,et al.  A new hybrid fault detection technique for systems-on-a-chip , 2006, IEEE Transactions on Computers.

[20]  J R Azambuja,et al.  Detecting SEEs in Microprocessors Through a Non-Intrusive Hybrid Technique , 2011, IEEE Transactions on Nuclear Science.

[21]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[22]  R. Baumann Soft errors in advanced semiconductor devices-part I: the three radiation sources , 2001 .

[23]  Fernanda Lima Kastensmidt,et al.  Evaluating the efficiency of data-flow software-based techniques to detect SEEs in microprocessors , 2011, 2011 12th Latin American Test Workshop (LATW).