GemFI: A Fault Injection Tool for Studying the Behavior of Applications on Unreliable Substrates

Dependable computing on unreliable substrates is the next challenge the computing community needs to overcome due to both manufacturing limitations in low geometries and the necessity to aggressively minimize power consumption. System designers often need to analyze the way hardware faults manifest as errors at the architectural level and how these errors affect application correctness. This paper introduces GemFI, a fault injection tool based on the cycle accurate full system simulator Gem5. GemFI provides fault injection methods and is easily extensible to support future fault models. It also supports multiple processor models and ISAs and allows fault injection in both functional and cycle-accurate simulations. GemFI offers fast-forwarding of simulation campaigns via check pointing. Moreover, it facilitates the parallel execution of campaign experiments on a network of workstations. In order to validate and evaluate GemFI, we used it to apply fault injection on a series of real-world kernels and applications. The evaluation indicates that its overhead compared with Gem5 is minimal (up to 3.3%), whereas optimizations such as fast-forwarding via check pointing and execution on NoWs can significantly reduce simulation time of a fault injection campaign.

[1]  Henrique Madeira,et al.  RIFLE: A General Purpose Pin-level Fault Injector , 1994, EDCC.

[2]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[3]  Touradj Ebrahimi,et al.  The JPEG 2000 still image compression standard , 2001, IEEE Signal Process. Mag..

[4]  Johan Karlsson,et al.  Design Guidelines of a VHDL-based Simulation Tool for the Validation of Fault Tolerance , 1997 .

[5]  Daniel P. Siewiorek,et al.  A Methodology for the Rapid Injection of Transient Hardware Errors , 1996, IEEE Trans. Computers.

[6]  Gene Cooperman,et al.  DMTCP: Transparent checkpointing for cluster computations and the desktop , 2007, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[7]  Daniel P. Siewiorek,et al.  Fault Injection Experiments Using FIAT , 1990, IEEE Trans. Computers.

[8]  Jiri Gaisler A portable and fault-tolerant microprocessor based on the SPARC v8 architecture , 2002, Proceedings International Conference on Dependable Systems and Networks.

[9]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[10]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[11]  Daniel P. Siewiorek,et al.  Effects of transient gate-level faults on program behavior , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[12]  Feng Wu,et al.  Overview of AVS video standard , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[13]  S. Welstead Fractal and Wavelet Image Compression Techniques , 1999 .

[14]  Jean Arlat,et al.  Fault Injection for Dependability Validation: A Methodology and Some Applications , 1990, IEEE Trans. Software Eng..

[15]  Volkmar Sieh,et al.  VERIFY: evaluation of reliability using VHDL-models with embedded fault descriptions , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[16]  Régis Leveugle,et al.  Statistical fault injection: Quantified error and confidence , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[17]  Johan Karlsson,et al.  Fault injection into VHDL models: the MEFISTO tool , 1994 .