Accelerated Simulated Fault Injection Testing

Fault injection testing approaches assess the reliability of execution environments for critical software. They support the early testing of safety concepts that mitigate the impact of hardware failures on software behavior. The growing use of platform software for embedded systems raises the need to verify safety concepts that execute on top of operating systems and middleware platforms. Current fault injection techniques consider the resulting software stack as one black box and attempt to test the reaction of all components in the context of faults. This leads to very high software complexity and consequently requires a very high number of fault injection experiments. Testing the software components, such as control functions, operating systems, and middleware, individually would lead to a significant reduction of the number of experiments required. In this paper, we illustrate our novel approach for fault injection testing, which considers the components of a software stack, enables re-use of previously collected evidences, allows focusing testing on highly critical parts of the control software, and significantly lowers the number of experiments required.

[1]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[2]  Ulf Schlichtmann,et al.  Technology-aware system failure analysis in the presence of soft errors by Mixture Importance Sampling , 2013, 2013 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS).

[3]  Thomas Kuhn,et al.  FERAL — Framework for simulator coupling on requirements and architecture level , 2013, 2013 Eleventh ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2013).

[4]  Jean-Marc Daveau,et al.  An industrial fault injection platform for soft-error dependability analysis and hardening of complex system-on-a-chip , 2009, 2009 IEEE International Reliability Physics Symposium.

[5]  Thomas Kuhn,et al.  Simulator Coupling for Network Fault Injection Testing , 2018 .

[6]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[7]  Henrique Madeira,et al.  Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers , 1998, IEEE Trans. Software Eng..

[8]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[9]  Timothy J. Dell,et al.  A white paper on the benefits of chipkill-correct ecc for pc server main memory , 1997 .

[10]  Jacob A. Abraham,et al.  FERRARI: A Flexible Software-Based Fault and Error Injection System , 1995, IEEE Trans. Computers.

[11]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[12]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[13]  Dimitris Gizopoulos,et al.  Differential Fault Injection on Microarchitectural Simulators , 2015, 2015 IEEE International Symposium on Workload Characterization.

[14]  Alfredo Benso,et al.  Fault-list collapsing for fault-injection experiments , 1998, Annual Reliability and Maintainability Symposium. 1998 Proceedings. International Symposium on Product Quality and Integrity.

[15]  Audhild Vaaje Theorems for Fault Collapsing in Combinational Circuits , 2006, J. Electron. Test..

[16]  Mehdi Baradaran Tahoori,et al.  Fault injection acceleration by architectural importance sampling , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[17]  Gilles Sassatelli,et al.  Accuracy evaluation of GEM5 simulator system , 2012, 7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[18]  Dimitris Gizopoulos,et al.  Anatomy of microarchitecture-level reliability assessment: Throughput and accuracy , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[19]  Giorgio Di Natale,et al.  Cache-aware reliability evaluation through LLVM-based analysis and fault injection , 2016, 2016 IEEE 22nd International Symposium on On-Line Testing and Robust System Design (IOLTS).

[20]  Sarita V. Adve,et al.  Understanding the propagation of hard errors to software and implications for resilient system design , 2008, ASPLOS.