Efficient Soft Error Vulnerability Analysis Using Non-intrusive Fault Injection Techniques

Electronic computing systems are integrating modern multicore processors and GPUs aiming to perform complex software stacks in different life-critical systems, including health devices and emerging self-driving cars. Such systems are expected to experience at least one soft error per day in the near future, which may lead to life-threatening failures. To prevent these failures, critical system must be tested and verified while under realistic workloads. This paper presents four novel non-intrusive fault injection techniques that enable full fault injection control and inspection of multicore systems behavior in the presence of faults. Proposed techniques were integrated into a fault injection framework and verified through a real automotive case study with up to 43 billions instructions. Results show that compared to traditional methods, the new techniques can increase the efficiency of fault injection campaigns during early development phase by 32.28%.

[1]  Franck Cappello,et al.  Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..

[2]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[3]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[4]  Aviral Shrivastava,et al.  nZDC: A compiler technique for near Zero Silent Data Corruption , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[5]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[6]  Jacob A. Abraham,et al.  Quantitative evaluation of soft error injection techniques for robust system design , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[7]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[8]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Aviral Shrivastava,et al.  gemV: A validated toolset for the early exploration of system reliability , 2016, 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[10]  Ricardo Reis,et al.  A fast and scalable fault injection framework to evaluate multi/many-core soft error reliability , 2015, 2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS).

[11]  Song Fu,et al.  F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application Vulnerability , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[12]  Fernanda Gusmão de Lima Kastensmidt,et al.  Soft error injection methodology based on QEMU software platform , 2014, LATW.

[13]  Laura Monroe,et al.  Design, Use and Evaluation of P-FSEFI: A Parallel Soft Error Fault Injection Framework for Emulating Soft Errors in Parallel Applications , 2016, SimuTools.

[14]  Amin Ansari,et al.  Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS 2010.

[15]  Ricardo Reis,et al.  Non-intrusive Fault Injection Techniques for Efficient Soft Error Vulnerability Analysis , 2019, 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC).

[16]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[17]  Sarita V. Adve,et al.  Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults , 2012, ASPLOS XVII.

[18]  Dimitris Gizopoulos,et al.  Differential Fault Injection on Microarchitectural Simulators , 2015, 2015 IEEE International Symposium on Workload Characterization.