One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors

Recent studies have shown that technology and voltage scaling are expected to increase the likelihood that particle-induced soft errors manifest as multiple-bit errors. This raises concerns about the validity of using single bit-flips for assessing the impact of soft errors in fault injection experiments. The goal of this paper is to investigate whether multiple-bit errors could cause a higher percentage of silent data corruptions (SDCs) compared to single-bit errors. Based on 2700 fault injection campaigns with 15 benchmark programs, featuring a total of 27 million experiments, our results show that single-bit errors in most cases yields a higher percentage of SDCs compared to multiple-bit errors. However, in 8% of the campaigns we observed a higher percentage of SDCs for multiple-bit errors. For most of these campaigns, the highest percentage of SDCs was obtained by flipping at most 3 bits. Moreover, we propose three ways of pruning the error space based on the results.

[1]  Daniel P. Siewiorek,et al.  A dimensionality model approach to testing and improving software robustness , 1999, 1999 IEEE AUTOTESTCON Proceedings (Cat. No.99CH36323).

[2]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[3]  W. C. Carter,et al.  Reliability modeling techniques for self-repairing computer systems , 1969, ACM '69.

[4]  Karthikeyan Sankaralingam,et al.  Relax: an architectural framework for software recovery of hardware faults , 2010, ISCA.

[5]  Karthik Pattabiraman,et al.  LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[6]  Pedro J. Gil,et al.  Non-intrusive Software-Implemented Fault Injection in Embedded Systems , 2003, LADC.

[7]  Jacob A. Abraham,et al.  EMAX - An automatic extractor of high-level error models , 1993 .

[8]  Henrique Madeira,et al.  Experimental evaluation of the fail-silent behavior in computers without error masking , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[9]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[10]  James L. Walsh,et al.  IBM experiments in soft fails in computer electronics (1978-1994) , 1996, IBM J. Res. Dev..

[11]  Johan Karlsson,et al.  Assembly-Level Pre-injection Analysis for Improving Fault Injection Efficiency , 2005, EDCC.

[12]  Sarita V. Adve,et al.  Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults , 2012, ASPLOS XVII.

[13]  Amin Ansari,et al.  Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS XV.

[14]  Olaf Spinczyk,et al.  Avoiding Pitfalls in Fault-Injection Based Comparison of Program Susceptibility to Soft Errors , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[15]  Sarita V. Adve,et al.  Approxilyzer: Towards a systematic framework for instruction-level approximate computing and its application to hardware resiliency , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[17]  Wen-mei W. Hwu,et al.  Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .

[18]  Roger Johansson,et al.  A Study of the Impact of Single Bit-Flip and Double Bit-Flip Errors on Program Execution , 2013, SAFECOMP.

[19]  Ravishankar K. Iyer,et al.  Error sensitivity of the Linux kernel executing on PowerPC G4 and Pentium 4 processors , 2004, International Conference on Dependable Systems and Networks, 2004.

[20]  Vassilios A. Chouliaras,et al.  Study of the Effects of SEU-Induced Faults on a Pipeline Protected Microprocessor , 2007, IEEE Transactions on Computers.

[21]  Scott A. Mahlke,et al.  Harnessing Soft Computations for Low-Budget Fault Tolerance , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[22]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[23]  Arshad Jhumka,et al.  An Investigation of the Impact of Double Bit-Flip Error Variants on Program Execution , 2015, ICA3PP.

[24]  Rüdiger Kapitza,et al.  Fail∗: Towards a versatile fault-injection experiment framework , 2012, ARCS 2012.

[25]  Thomas F. Arnold,et al.  The Concept of Coverage and Its Effect on the Reliability Model of a Repairable System , 1973, IEEE Transactions on Computers.

[26]  Bo Fang,et al.  ePVF: An Enhanced Program Vulnerability Factor Methodology for Cross-Layer Resilience Analysis , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[27]  Jacob A. Abraham,et al.  Quantitative evaluation of soft error injection techniques for robust system design , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[28]  Johan Karlsson,et al.  GOOFI-2: A tool for experimental dependability assessment , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[29]  Massimo Violante,et al.  A New Approach to Software-Implemented Fault Tolerance , 2004, J. Electron. Test..

[30]  Karthik Pattabiraman,et al.  LLFI : An Intermediate Code Level Fault Injector For Soft Computing Applications , 2013 .

[31]  Karthik Pattabiraman,et al.  Characterizing the Impact of Intermittent Hardware Faults on Programs , 2015, IEEE Transactions on Reliability.

[32]  Roger Johansson,et al.  A Comparison of Inject-on-Read and Inject-on-Write in ISA-Level Fault Injection , 2015, 2015 11th European Dependable Computing Conference (EDCC).

[33]  Shubhendu S. Mukherjee,et al.  Perturbation-based Fault Screening , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.