Analyzing and increasing soft error resilience of Deep Neural Networks on ARM processors

Abstract Deep Neural Networks (DNNs) have been successfully deployed in safety-critical applications due to the capability of computing in complex tasks. Because of low energy, ARM (Advanced RISC Machine) processors are used for DNNs in embedded applications. However, in harsh environments, soft errors induced by radiation strikes may cause Silent Data Corruptions (SDCs) and Detected Unrecoverable Errors (DUEs). In this work, for DNNs, we evaluate the soft error resilience of the register file and analyze the impact of compiler optimizations. The results show that compiler optimization significantly degrades the reliability of DNNs. Furthermore, we track SDC propagation and record execution time for each layer. The results indicate that for most DNNs, convolutional layers are the most vulnerable because they are the most time-consuming parts. For instructions, we evaluate Program Vulnerability Factor (PVF) contributions of instructions and identify the vulnerable instructions that may cause critical SDCs. To mitigate critical SDCs, we propose two efficient approaches: 1) selective kernel hardening and 2) Symptom-based Duplication with Comparison (SDWC). The former reduces SDCs by an order of magnitude and incurs 33.56% time overhead. The second approach reduces critical SDCs to 0 and incurs less than 10% time overhead. For DUEs, we propose an idempotency-based recovery. Our approach mitigates more than 92.2% DUEs and incurs 3.43% latency overhead on average.

[1]  Vikas Chandra,et al.  CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs , 2018, ArXiv.

[2]  Johan Karlsson,et al.  Fault injection into VHDL models: the MEFISTO tool , 1994 .

[3]  Tudor Dumitras,et al.  Terminal Brain Damage: Exposing the Graceless Degradation in Deep Neural Networks Under Hardware Fault Attacks , 2019, USENIX Security Symposium.

[4]  Ronald D. Williams,et al.  Design of a high performance FPGA based fault injector for real-time safety-critical systems , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.

[5]  KouZi Xing,et al.  Training for 'Unstable' CNN Accelerator: A Case Study on FPGA , 2018, ArXiv.

[6]  Massimo Violante,et al.  Analysis of SEU effects in a pipelined processor , 2002, Proceedings of the Eighth IEEE International On-Line Testing Workshop (IOLTW 2002).

[7]  Somesh Jha,et al.  Static analysis and compiler design for idempotent processing , 2012, PLDI.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Gustavo Ribeiro Alves,et al.  Real time fault injection using a modified debugging infrastructure , 2006, 12th IEEE International On-Line Testing Symposium (IOLTS'06).

[10]  G.R. Allen Compendium of Test Results of Single Event Effects Conducted by the Jet Propulsion Laboratory , 2008, 2008 IEEE Radiation Effects Data Workshop.

[11]  Lilian Bossuet,et al.  JTAG Combined Attack - Another Approach for Fault Injection , 2016, 2016 8th IFIP International Conference on New Technologies, Mobility and Security (NTMS).

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  F. Irom,et al.  Space processor radiation mitigation and validation techniques for an 1,800 MIPS processor board , 2003, Proceedings of the 7th European Conference on Radiation and Its Effects on Components and Systems, 2003. RADECS 2003..

[14]  Luigi Carro,et al.  Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs , 2019, IEEE Transactions on Reliability.

[15]  Ricardo Reis,et al.  Soft Error Reliability Assessment of Neural Networks on Resource-constrained IoT Devices , 2020, 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS).

[16]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[17]  Michel Pignol,et al.  How to cope with SEU/SET at system level? , 2005, 11th IEEE International On-Line Testing Symposium.

[18]  Régis Leveugle,et al.  Statistical fault injection: Quantified error and confidence , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[19]  Haibin Wang,et al.  Analyzing the impact of soft errors in VGG networks implemented on GPUs , 2020 .

[20]  Vincenzo Piuri,et al.  Sensitivity to errors in artificial neural networks: a behavioral approach , 1995 .

[21]  Guanpeng Li,et al.  Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Jianan Wang,et al.  Soft Error Resilience of Deep Residual Networks for Object Recognition , 2020, IEEE Access.

[23]  Heather Quinn,et al.  Microcontroller Compiler-Assisted Software Fault Tolerance , 2019, IEEE Transactions on Nuclear Science.

[24]  C. Frost,et al.  Selective Hardening for Neural Networks in FPGAs , 2019, IEEE Transactions on Nuclear Science.

[25]  A. Cadena,et al.  Development of a hybrid autonomous underwater vehicle for benthic monitoring , 2018, 2018 4th International Conference on Control, Automation and Robotics (ICCAR).

[26]  David I. August,et al.  Design and Evaluation of Hybrid Fault-Detection Systems , 2005, ISCA 2005.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Farshad Firouzi,et al.  Instruction reliability analysis for embedded processors , 2010, 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems.

[29]  Shuguang Feng,et al.  Cost-efficient soft error protection for embedded microprocessors , 2006, CASES '06.

[30]  W. H. Robinson,et al.  Fault Simulation and Emulation Tools to Augment Radiation-Hardness Assurance Testing , 2013, IEEE Transactions on Nuclear Science.

[31]  Dimitris Gizopoulos,et al.  Demystifying Soft Error Assessment Strategies on ARM CPUs: Microarchitectural Fault Injection vs. Neutron Beam Experiments , 2019, 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[32]  Xin Fu,et al.  Analyzing soft-error vulnerability on GPGPU microarchitecture , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[33]  David R. Kaeli,et al.  Eliminating microarchitectural dependency from Architectural Vulnerability , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[34]  Paolo Rech,et al.  Register File Criticality and Compiler Optimization Effects on Embedded Microprocessor Reliability , 2017, IEEE Transactions on Nuclear Science.

[35]  Eduardo Chielle,et al.  Reliability on ARM Processors Against Soft Errors Through SIHFT Techniques , 2016, IEEE Transactions on Nuclear Science.

[36]  Gustavo Ribeiro Alves,et al.  Using NEXUS compliant debuggers for real time fault injection on microprocessors , 2006, SBCCI '06.

[37]  S. Rezgui,et al.  Predicting error rate for microprocessor-based digital architectures through C.E.U. (Code Emulating Upsets) injection , 2000 .

[38]  Devesh Tiwari,et al.  Compiler-Directed Lightweight Checkpointing for Fine-Grained Guaranteed Soft Error Recovery , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[39]  Massimo Violante,et al.  Fault Injection-based Reliability Evaluation of SoPCs , 2006, Eleventh IEEE European Test Symposium (ETS'06).

[40]  Vincenzo Piuri,et al.  High Performance Fault-Tolerant Digital Neural Networks , 1998, IEEE Trans. Computers.