Reliability Analysis of Vision Transformers

Vision Transformers (ViTs) that leverage self-attention mechanism have shown superior performance on many classical vision tasks compared to convolutional neural networks (CNNs) and gain increasing popularity recently. Existing ViTs works mainly optimize performance and accuracy, but ViTs reliability issues induced by hardware faults in large-scale VLSI designs have generally been overlooked. In this work, we mainly study the reliability of ViTs and investigate the vulnerability from different architecture granularities ranging from models, layers, modules, and patches for the first time. The investigation reveals that ViTs with the self-attention mechanism are generally more resilient on linear computing including general matrix-matrix multiplication (GEMM) and full connection (FC), and show a relatively even vulnerability distribution across the patches. However, ViTs involve more fragile non-linear computing such as softmax and GELU compared to typical CNNs. With the above observations, we propose an adaptive algorithm-based fault tolerance algorithm (ABFT) to protect the linear computing implemented with distinct sizes of GEMM and apply a range-based protection scheme to mitigate soft errors in non-linear computing. According to our experiments, the proposed fault-tolerant approaches enhance ViT accuracy significantly with minor computing overhead in presence of various soft errors.

[1]  Zhen Gao,et al.  Special Session: Fault-Tolerant Deep Learning: A Hierarchical Perspective , 2022, 2022 IEEE 40th VLSI Test Symposium (VTS).

[2]  Ying Wang,et al.  Winograd convolution: a perspective from fault tolerance , 2022, DAC.

[3]  Christopher W. Fletcher,et al.  Optimizing Selective Protection for CNN Resilience , 2021, 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE).

[4]  Ying Wang,et al.  R2F: A Remote Retraining Framework for AIoT Processors With Computing Errors , 2021, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Matthijs Douze,et al.  XCiT: Cross-Covariance Image Transformers , 2021, NeurIPS.

[6]  Matthieu Cord,et al.  Going deeper with Image Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Xiaojie Jin,et al.  DeepViT: Towards Deeper Vision Transformer , 2021, ArXiv.

[8]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[9]  Liqi Ping,et al.  SERN: Modeling and Analyzing the Soft Error Reliability of Convolutional Neural Networks , 2020, ACM Great Lakes Symposium on VLSI.

[10]  David Z. Pan,et al.  SHIELDeNN: Online Accelerated Framework for Fault-Tolerant Deep Neural Network Architectures , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[11]  K. Pattabiraman,et al.  A Low-cost Fault Corrector for Deep Neural Networks through Range Restriction , 2020, 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[12]  Sparsh Mittal,et al.  A survey on modeling and improving reliability of DNN algorithms and accelerators , 2020, J. Syst. Archit..

[13]  Deog-Kyoon Jeong,et al.  A^3: Accelerating Attention Mechanisms in Neural Networks with Approximation , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[14]  Xiaoming Chen,et al.  FTT-NAS: Discovering Fault-Tolerant Neural Architecture , 2020, 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC).

[15]  Gu-Yeon Wei,et al.  Ares: A framework for quantifying the resilience of deep neural networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[16]  Guanpeng Li,et al.  Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[21]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).