A cross-layer fault propagation analysis method for edge intelligence systems deployed with DNNs

Abstract To evaluate the impact of soft errors on convolutional neural networks (CNNs) deployed in edged computation systems, we propose a data-driven assessment strategy to characterize the propagation flow across hardware and software abstraction layers of the system in an interpretable way. Single-bit-flip injections in underlying hardware architecture are performed on virtual embedded system with a CNN-based image classifier deployed on it. We depict the local activation and global dependencies caused by soft errors across the system in form of a directed acyclic graph by using generative adversarial networks and Bayesian networks as data modeling methods. The cross-layer fault propagation paths and component sensitivities show that the deep neural networks like CNNs can effectively prevent the faults that may cause critical failures from propagating to the system output via the channel sparsity and regular pooling mechanism in the network pipelines.

[1]  P. Hazucha,et al.  Impact of CMOS technology scaling on the atmospheric neutron soft error rate , 2000 .

[2]  Alessandro Savino,et al.  Cross-layer reliability evaluation, moving from the hardware architecture to the system level: A CLERECO EU project overview , 2015, Microprocess. Microsystems.

[3]  Guanpeng Li,et al.  Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Jianan Wang,et al.  Soft Error Resilience of Deep Residual Networks for Object Recognition , 2020, IEEE Access.

[5]  Bernard Girau,et al.  Fault and Error Tolerance in Neural Networks: A Review , 2017, IEEE Access.

[6]  Gianfranco Politano,et al.  Cross-layer system reliability assessment framework for hardware faults , 2016, 2016 IEEE International Test Conference (ITC).

[7]  Albert Y. Zomaya,et al.  A Survey of Mobile Device Virtualization , 2016, ACM Comput. Surv..

[8]  Pedro M. Domingos,et al.  Learning Bayesian network classifiers by maximizing conditional likelihood , 2004, ICML.

[9]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[10]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[11]  Zitao Chen,et al.  Ranger: Boosting Error Resilience of Deep Neural Networks through Range Restriction , 2020, ArXiv.

[12]  Luigi Carro,et al.  Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs , 2019, IEEE Transactions on Reliability.

[13]  Muhammad Abdullah Hanif,et al.  FT-ClipAct: Resilience Analysis of Deep Neural Networks and Improving their Fault Tolerance using Clipped Activation , 2019, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[14]  Muhammad Shafique,et al.  Error resilience analysis for systematically employing approximate computing in convolutional neural networks , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[15]  Gerd Ascheid,et al.  An Efficient Bit-Flip Resilience Optimization Method for Deep Neural Networks , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16]  Sri Parameswaran,et al.  Processor Design for Soft Errors , 2016, ACM Comput. Surv..

[17]  Stefano Di Carlo,et al.  ReDO: Cross-Layer Multi-Objective Design-Exploration Framework for Efficient Soft Error Resilient Systems , 2018, IEEE Transactions on Computers.

[18]  Gianfranco Politano,et al.  A Bayesian model for system level reliability estimation , 2015, 2015 20th IEEE European Test Symposium (ETS).

[19]  Domenico Cotroneo,et al.  Assessing Dependability with Software Fault Injection , 2016, ACM Comput. Surv..

[20]  Ronald D. Schrimpf,et al.  The Impact of Proton-Induced Single Events on Image Classification in a Neuromorphic Computing Architecture , 2020, IEEE Transactions on Nuclear Science.

[21]  Régis Leveugle,et al.  Statistical fault injection: Quantified error and confidence , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[22]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[23]  Jan-Gerd Mess Techniques of Artificial Intelligence for Space Applications-A Survey , 2019 .

[24]  D. Ruelle,et al.  Recurrence Plots of Dynamical Systems , 1987 .

[25]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[26]  A. Bosio,et al.  SyRA: Early System Reliability Analysis for Cross-Layer Soft Errors Resilience in Memory Arrays of Microprocessor Systems , 2019, IEEE Transactions on Computers.

[27]  Mahmood Fathy,et al.  Adversarially Learned One-Class Classifier for Novelty Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Luigi Carro,et al.  Impact of Reduced Precision in the Reliability of Deep Neural Networks for Object Detection , 2019, 2019 IEEE European Test Symposium (ETS).