SEDSR: Soft Error Detection Using Software Redundancy

This paper presents a new method for soft error detection using software redundancy (SEDSR) that is able to detect transient faults. Soft errors damage the control flow and data of programs and designers usually use hardware-based solutions to handle them. Software-based techniques for soft error detection force less cost and delay to systems and do not change their configuration. Therefore, these kinds of methods are appropriate alternatives for hardware-based techniques. SEDSR has two separate parts for data and control flow errors detection. Fault injection method is used to compare SEDSR with previous methods of this field based on the new parameter of “Evaluation Factor” that takes in account fault coverage, memory and performance overheads. These parameters are important in real time safety critical applications. Experimental results on SPEC2000 and some traditional benchmarks of this field show that SEDSR is much better than previous methods of this field. SEDSR’s evaluation factor is about 50% better than other methods of this field. These results show its success in satisfaction of the existing tradeoff between fault coverage, performance and memory overheads.

[1]  Herbert Bos,et al.  Can We Make Operating Systems Reliable and , 2006 .

[2]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[3]  Jacob A. Abraham,et al.  CEDA: Control-Flow Error Detection Using Assertions , 2011, IEEE Transactions on Computers.

[4]  Bogdan Nicolescu,et al.  Detecting Soft Errors by a Purely Software Approach: Method, Tools and Experimental Results , 2003, DATE.

[5]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[6]  Bingrong Hong,et al.  Software implemented transient fault detection in space computer , 2007 .

[7]  Subhasish Mitra,et al.  S: Error Detection by Diverse Data and Duplicated Instructions , 2002 .

[8]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[9]  Seyed Ghassem Miremadi,et al.  CFCET: A hardware-based control flow checking technique in COTS processors using execution tracing , 2006, Microelectron. Reliab..

[10]  Bingrong Hong,et al.  On-line control flow error detection using relationship signatures among basic blocks , 2010, Comput. Electr. Eng..

[11]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[12]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[13]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..

[14]  Jacob A. Abraham,et al.  CEDA: control-flow error detection through assertions , 2006, 12th IEEE International On-Line Testing Symposium (IOLTS'06).