An effective soft error detection mechanism using redundant instructions

Computer Systems which operate in space environment are Subject to different radiation phenomena that lead to soft errors and can cause unpredictable behaviours of computer-based systems. Commercial Off-The Shelf (COTS) equipment which is commonly used in space missions cannot tolerate some threats such as Single Event Upsets (SEU). Therefore, there are some considerations in resisting this equipment against possible threats. In this paper, a software instruction level method that is called Soft Error Detection using Redundant Instructions (SEDRI) is provided to detect soft errors which influence control flow and program data. This method is evaluated by fault injection on several C benchmark programs. The experimental results show that without protecting a program against control flow and data errors 34% of them affect the program and damage it; but, by using our method, this rate is decreased to about 11%. Comparing to previous presented techniques, SEDRI method has a considerable improvement in performance and memory overhead, i.e., 46% and 55% respectively, and its fault coverage decrease about 9%.

[1]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[2]  Herbert Bos,et al.  Can We Make Operating Systems Reliable and , 2006 .

[3]  Bogdan Nicolescu,et al.  Detecting Soft Errors by a Purely Software Approach: Method, Tools and Experimental Results , 2003, DATE.

[4]  Hossein Pedram,et al.  Software-Based Control Flow Checking Against Transient Faults in Industrial Environments , 2014, IEEE Transactions on Industrial Informatics.

[5]  Bingrong Hong,et al.  On-line control flow error detection using relationship signatures among basic blocks , 2010, Comput. Electr. Eng..

[6]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..

[7]  Rachid Beghdad,et al.  On handling real-time communications in mac protocols , 2012, Int. Arab J. Inf. Technol..

[8]  Bingrong Hong,et al.  Software implemented transient fault detection in space computer , 2007 .

[9]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[10]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[11]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[12]  Y. Savaria,et al.  SIED: software implemented error detection , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[13]  Y. Savaria,et al.  Software detection mechanisms providing full coverage against single bit-flip faults , 2004, IEEE Transactions on Nuclear Science.

[14]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.