Program-Structure-Guided Approximation of Large Fault Spaces

Due to shrinking structure sizes and operating voltages, hardware becomes more susceptible to transient faults. Fault injection campaigns are a common approach to systematically assess the resilience of a system and the effectiveness of software-based counter measures. However, experimentally injecting all possible faults to achieve full fault-space coverage is infeasible in practice. While precise pruning techniques, such as def/use pruning, already provide a significant reduction of the campaign size, the number of injections remains still challenging for even medium-sized systems. We propose fault-space regions (FSRs) as a method to approximately cover the complete fault space with a significantly lower number of required injections. Instead of probabilistic subsampling of the fault space, our approximation exploits the actual program structure and execution trace (e.g., flow of basic blocks) to identify injection points that are representatives for a larger set of faults. We identify such data-flow regions and inject only data values that flow across region boundaries. Thereby, we can further reduce the number of injections by up to 76 percent, while the results divert only by less than 2.7 percent from those of a complete and precise fault-injection campaign. Furthermore, we keep the locality of the results regarding silent data corruptions to a deviation of less than 6.9 percent.

[1]  Giovanni Squillero,et al.  New techniques for speeding-up fault-injection campaigns , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[2]  Alfredo Benso,et al.  Fault Injection Techniques and Tools for Embedded Systems , 2003 .

[3]  Johan Karlsson,et al.  Path-Based Error Coverage Prediction , 2002, J. Electron. Test..

[4]  Olaf Spinczyk,et al.  Avoiding Pitfalls in Fault-Injection Based Comparison of Program Susceptibility to Soft Errors , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[5]  Bo Fang,et al.  ePVF: An Enhanced Program Vulnerability Factor Methodology for Cross-Layer Resilience Analysis , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[6]  Cristian Constantinescu,et al.  Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.

[7]  QingPing Tan,et al.  SmartInjector: Exploiting intelligent fault injection for SDC rate analysis , 2013, 2013 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS).

[8]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[9]  Thiago Santini,et al.  Effectiveness of Software-Based Hardening for Radiation-Induced Soft Errors in Real-Time Operating Systems , 2017, ARCS.

[10]  Mehdi Baradaran Tahoori,et al.  Fault injection acceleration by architectural importance sampling , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[11]  Régis Leveugle,et al.  Statistical fault injection: Quantified error and confidence , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[12]  Roger Johansson,et al.  Light-Weight Techniques for Improving the Controllability and Efficiency of ISA-Level Fault Injection Tools , 2017, 2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC).

[13]  Christian Steger,et al.  Efficient fault emulation using automatic pre-injection memory access analysis , 2012, 2012 IEEE International SOC Conference.

[14]  Olaf Spinczyk,et al.  Rapid Fault-Space Exploration by Evolutionary Pruning , 2014, SAFECOMP.

[15]  Jean Arlat,et al.  Fault Injection for Dependability Validation: A Methodology and Some Applications , 1990, IEEE Trans. Software Eng..

[16]  Olaf Spinczyk,et al.  FAIL*: An Open and Versatile Fault-Injection Framework for the Assessment of Software-Implemented Hardware Fault Tolerance , 2015, 2015 11th European Dependable Computing Conference (EDCC).

[17]  Johan Karlsson,et al.  Evaluation of error detection schemes using fault injection by heavy-ion radiation , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[18]  Pia Sanda,et al.  Statistical Fault Injection , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[19]  Mehdi Baradaran Tahoori,et al.  An analytical approach for soft error rate estimation in digital circuits , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[20]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[21]  Mehdi Baradaran Tahoori,et al.  Fault injection acceleration by simultaneous injection of non-interacting faults , 2016, DAC.

[22]  Johan Karlsson,et al.  Assembly-Level Pre-injection Analysis for Improving Fault Injection Efficiency , 2005, EDCC.

[23]  Sarita V. Adve,et al.  Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults , 2012, ASPLOS XVII.

[24]  Karthik Pattabiraman,et al.  Modeling Soft-Error Propagation in Programs , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[25]  Karthik Pattabiraman,et al.  LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[26]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[27]  Volkmar Sieh,et al.  Combining software-implemented and simulation-based fault injection into a single fault injection method , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[28]  David R. Kaeli,et al.  Eliminating microarchitectural dependency from Architectural Vulnerability , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[29]  Barry W. Johnson,et al.  A method to determine equivalent fault classes for permanent and transient faults , 1995, Annual Reliability and Maintainability Symposium 1995 Proceedings.

[30]  Shubu Mukherjee,et al.  Architecture Design for Soft Errors , 2008 .

[31]  Sarita V. Adve,et al.  Relyzer: Application Resiliency Analyzer for Transient Faults , 2013, IEEE Micro.

[32]  Michael Engel,et al.  Investigating the Limitations of PVF for Realistic Program Vulnerability Assessment , 2012 .