An Investigation of the Fault Sensitivity of Four Benchmark Workloads

This paper presents an experimental study of the fault sensitivity of four programs included in the MiBench test suit. We investigate their fault sensitivity with respect to hardware faults that manifest as single bit flips in main memory locations and instruction set architecture registers. To this end, we have conducted extensive fault injection experiments with two versions of each program, one basic version and one where the program is equipped with software- implemented hardware fault tolerance (SIHFT) through triple time redundant execution, majority voting and forward recovery (TTR-FR). The results show that TTR-FR achieves an error coverage between 94.6% and 99.2%, while the non- fault-tolerant versions achieve an error coverage between 55.8% and 81.1%. To gain understanding of the origin of the non-covered faults, we provide statistics on the fault sensitivity of different source code blocks, physical fault locations (instruction set architecture registers and main memory words) and different categories of machine instructions.

[1]  Massimo Violante,et al.  A New Approach to Software-Implemented Fault Tolerance , 2004, J. Electron. Test..

[2]  Y. Savaria,et al.  Software detection mechanisms providing full coverage against single bit-flip faults , 2004, IEEE Transactions on Nuclear Science.

[3]  Alfredo Benso,et al.  A C/C++ source-to-source compiler for dependable applications , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[4]  Daniel P. Siewiorek,et al.  FIAT-fault injection based automated testing environment , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[5]  Johan Karlsson,et al.  Considering Workload Input Variations in Error Coverage Estimation , 1999, EDCC.

[6]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[7]  Johan Karlsson,et al.  Fault injection-based assessment of aspect-oriented implementation of fault tolerance , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[8]  Henrique Madeira,et al.  RIFLE: A General Purpose Pin-level Fault Injector , 1994, EDCC.

[9]  Wolfgang Schröder-Preikschat,et al.  Eliminating Single Points of Failure in Software-Based Redundancy , 2012, 2012 Ninth European Dependable Computing Conference.

[10]  Roger Johansson,et al.  On the Impact of Hardware Faults - An Investigation of the Relationship between Workload Inputs and Failure Mode Distributions , 2012, SAFECOMP.

[11]  Johan Karlsson,et al.  Software Implemented Detection and Recovery of Soft Errors in a Brake-by-Wire System , 2008, 2008 Seventh European Dependable Computing Conference.

[12]  Bingrong Hong,et al.  Software implemented transient fault detection in space computer , 2007 .

[13]  Henrique Madeira,et al.  Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers , 1998, IEEE Trans. Software Eng..

[14]  Johan Karlsson,et al.  Assembly-Level Pre-injection Analysis for Improving Fault Injection Efficiency , 2005, EDCC.

[15]  Antonio Martínez-Álvarez,et al.  Compiler-Directed Soft Error Mitigation for Embedded Systems , 2012, IEEE Transactions on Dependable and Secure Computing.

[16]  Johan Karlsson,et al.  Two software techniques for on-line error detection , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[17]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[18]  Johan Karlsson,et al.  Using heavy-ion radiation to validate fault-handling mechanisms , 1994, IEEE Micro.

[19]  Johan Karlsson,et al.  Comparison of Physical and Software-Implemented Fault Injection Techniques , 2003, IEEE Trans. Computers.

[20]  Christof Fetzer,et al.  Software Encoded Processing: Building Dependable Systems with Commodity Hardware , 2007, SAFECOMP.

[21]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[22]  Zizhong Chen,et al.  Algorithm-Based Fault Tolerance for Fail-Stop Failures , 2008, IEEE Transactions on Parallel and Distributed Systems.

[23]  Johan Karlsson,et al.  GOOFI-2: A tool for experimental dependability assessment , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[24]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..