FIMSIM: A fault injection infrastructure for microarchitectural simulators

Fault injection is a widely used approach for experiment-based dependability evaluation. Injecting faults to microarchitectural simulators is particularly appealing for researchers, since it can be utilized at the early design stage of the processor. As such, it enables a preliminary analysis of the correlation between the criticality of processor-structure level faults and their impact on applications. In this study, we present FIMSIM, a compact fault injection infrastructure for microarchitectural simulators which is capable of injecting transient, permanent, intermittent and multi-bit faults. FIMSIM provides the opportunity to comprehensively evaluate the vulnerability of different microarchitectural structures against different fault models.

[1]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[2]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[3]  Cristian Constantinescu,et al.  Impact of deep submicron technology on dependability of VLSI circuits , 2002, Proceedings International Conference on Dependable Systems and Networks.

[4]  Pedro J. Gil,et al.  Study, comparison and application of different VHDL-based fault injection techniques for the experimental validation of a fault-tolerant system , 2003, Microelectron. J..

[5]  Cristian Constantinescu,et al.  Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.

[6]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[7]  Babak Falsafi,et al.  Fingerprinting: bounding soft-error-detection latency and bandwidth , 2004, IEEE Micro.

[8]  Tryggve Fossum,et al.  Cache scrubbing in microprocessors: myth or necessity? , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[9]  Sarita V. Adve,et al.  The impact of technology scaling on lifetime reliability , 2004, International Conference on Dependable Systems and Networks, 2004.

[10]  Robert Baumann,et al.  Soft errors in advanced computer systems , 2005, IEEE Design & Test of Computers.

[11]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[12]  Babak Falsafi,et al.  Reunion: Complexity-Effective Multicore Redundancy , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[13]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[14]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[15]  Senior Member Intermittent Faults in VLSI Circuits , 2006 .

[16]  Sanjay J. Patel,et al.  ReStore: Symptom-Based Soft Error Detection in Microprocessors , 2006, IEEE Trans. Dependable Secur. Comput..

[17]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[18]  Koushik Chakraborty,et al.  Adapting to intermittent faults in multicore systems , 2008, ASPLOS.

[19]  Michail Maniatakos,et al.  On the Correlation between Controller Faults and Instruction-Level Errors in Modern Microprocessors , 2008, 2008 IEEE International Test Conference.

[20]  Sarita V. Adve,et al.  Understanding the propagation of hard errors to software and implications for resilient system design , 2008, ASPLOS.

[21]  Pedro J. Gil,et al.  Analysis of the influence of intermittent faults in a microcontroller , 2008, 2008 11th IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems.

[22]  J. Draper,et al.  Parallel double error correcting code design to mitigate multi-bit upsets in SRAMs , 2008, ESSCIRC 2008 - 34th European Solid-State Circuits Conference.

[23]  Sarita V. Adve,et al.  Accurate microarchitecture-level fault modeling for studying hardware faults , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[24]  Nematollah Bidokhti SEU concept to reality (allocation, prediction, mitigation) , 2010, 2010 Proceedings - Annual Reliability and Maintainability Symposium (RAMS).

[25]  John Lach,et al.  Transient fault models and AVF estimation revisited , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[26]  Mateo Valero Cortés,et al.  FaulTM: Fault-Tolerance Using Hardware Transactional Memory , 2010 .

[27]  Karthik Pattabiraman,et al.  Towards understanding the effects of intermittent hardware faults on programs , 2010, 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W).

[28]  Michail Maniatakos,et al.  Instruction-Level Impact Analysis of Low-Level Faults in a Modern Microprocessor Controller , 2011, IEEE Transactions on Computers.

[29]  Mateo Valero,et al.  SymptomTM: Symptom-Based Error Detection and Recovery Using Hardware Transactional Memory , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.