FT-ClipAct: Resilience Analysis of Deep Neural Networks and Improving their Fault Tolerance using Clipped Activation

Deep Neural Networks (DNNs) are widely being adopted for safety-critical applications, e.g., healthcare and autonomous driving. Inherently, they are considered to be highly error-tolerant. However, recent studies have shown that hardware faults that impact the parameters of a DNN (e.g., weights) can have drastic impacts on its classification accuracy. In this paper, we perform a comprehensive error resilience analysis of DNNs subjected to hardware faults (e.g., permanent faults) in the weight memory. The outcome of this analysis is leveraged to propose a novel error mitigation technique which squashes the high- intensity faulty activation values to alleviate their impact. We achieve this by replacing the unbounded activation functions with their clipped versions. We also present a method to systematically define the clipping values of the activation functions that result in increased resilience of the networks against faults. We evaluate our technique on the AlexNet and the VGG-16 DNNs trained for the CIFAR-10 dataset. The experimental results show that our mitigation technique significantly improves the resilience of the DNNs to faults. For example, the proposed technique offers on average 68.92% improvement in the classification accuracy of resilience-optimized VGG-16 model at 1 × 10-5 fault rate, when compared to the base network without any fault mitigation.

[1]  Xuefei Ning,et al.  Fault-tolerant training with on-line fault detection for RRAM-based neural computing systems , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[2]  Robert E. Lyons,et al.  The Use of Triple-Modular Redundancy to Improve Computer Reliability , 1962, IBM J. Res. Dev..

[3]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[4]  Jing Guo,et al.  Novel Low-Power and Highly Reliable Radiation Hardened Memory Cell for 65 nm CMOS Technology , 2014, IEEE Transactions on Circuits and Systems I: Regular Papers.

[5]  Cristian Constantinescu,et al.  Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  K. Arimoto,et al.  A built-in Hamming code ECC circuit for DRAMs , 1989 .

[8]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[9]  Muhammad Shafique,et al.  Building Robust Machine Learning Systems: Current Progress, Research Challenges, and Opportunities , 2019, DAC.

[10]  B. L. Bhuva,et al.  Reliability-Aware Synthesis of Combinational Logic With Minimal Performance Penalty , 2013, IEEE Transactions on Nuclear Science.

[11]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Gu-Yeon Wei,et al.  Ares: A framework for quantifying the resilience of deep neural networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[16]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[17]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[18]  Thierry Moreau,et al.  MATIC: Learning around errors for efficient low-voltage neural network accelerators , 2017, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Xiang Gu,et al.  Tolerating Soft Errors in Deep Learning Accelerators with Reliable On-Chip Memory Designs , 2018, 2018 IEEE International Conference on Networking, Architecture and Storage (NAS).

[20]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[21]  Qiang Xu,et al.  ApproxANN: An approximate computing framework for artificial neural network , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[22]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).