Resource-Aware Online Permanent Fault Detection Mechanism for Streaming Convolution Engine in Edge AI Accelerators

Edge AI accelerators have gained popularity as a solution for applications such as image recognition sensors, remote sensing satellites, robotics, wearable devices, and drones due to their compact size and low power consumption. However, these applications demand fault tolerance, namely reliability, to overcome defects caused by radiation or manufacturing defects, especially in hard-to-reach environments like space or nuclear power stations. This paper presents an online permanent fault detection mechanism for streaming convolution engines in edge AI accelerators. The detection mechanism comprising extra comparison modules is added to a convolution engine’s processing elements (PEs). The experiment results show low overhead of the fault detection mechanism’s hardware resource and power consumption. The resource overheads are less than 3.6%, while the overhead of power consumption is not more than 1.2%.

[1]  T. Arslan,et al.  DycSe: A Low-Power, Dynamic Reconfiguration Column Streaming-Based Convolution Engine for Resource-Aware Edge AI Accelerators , 2023, Journal of Low Power Electronics and Applications.

[2]  A. Louri,et al.  FSA: An Efficient Fault-tolerant Systolic Array-based DNN Accelerator Architecture , 2022, 2022 IEEE 40th International Conference on Computer Design (ICCD).

[3]  T. Arslan,et al.  Low-Power Ultra-Small Edge AI Accelerators for Image Recognition with Convolution Neural Networks: Analysis and Future Directions , 2021, Electronics.

[4]  Krishnendu Chakrabarty,et al.  C-Testing of AI Accelerators * , 2020, 2020 IEEE 29th Asian Test Symposium (ATS).

[5]  Karim Abbas,et al.  Handbook of Digital CMOS Technology, Circuits, and Systems , 2020 .

[6]  Muhammad Abdullah Hanif,et al.  FT-ClipAct: Resilience Analysis of Deep Neural Networks and Improving their Fault Tolerance using Clipped Activation , 2019, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7]  Siddharth Garg,et al.  Fault-Tolerant Systolic Array Based Accelerators for Deep Neural Network Execution , 2019, IEEE Design & Test.

[8]  Osman S. Unsal,et al.  On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation , 2018, 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[9]  Jeff Zhang,et al.  Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator , 2018, 2018 IEEE 36th VLSI Test Symposium (VTS).

[10]  Bernard Girau,et al.  Fault and Error Tolerance in Neural Networks: A Review , 2017, IEEE Access.

[11]  Yen-Cheng Kuan,et al.  A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[12]  Karthikeyan Sankaralingam,et al.  Stream-dataflow acceleration , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[13]  Mehdi Baradaran Tahoori,et al.  Error propagation aware timing relaxation for approximate near threshold computing , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[14]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[15]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[16]  William Lindsay,et al.  FRITS - a microprocessor functional BIST method , 2002, Proceedings. International Test Conference.

[17]  Andreas Steininger,et al.  On the necessity of on-line-BIST in safety-critical applications-a case-study , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[18]  Janak H. Patel,et al.  Stuck-at fault: a fault model for the next millennium , 1998, Proceedings International Test Conference 1998 (IEEE Cat. No.98CH36270).

[19]  Santosh S. Venkatesh,et al.  The Science of Making ERORS: What Error Tolerance Implies for Capacity in Neural Networks , 1992, IEEE Trans. Knowl. Data Eng..

[20]  Benjamin W. Wah,et al.  Fault tolerant neural networks with hybrid redundancy , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[21]  J.A.G. Nijhuis,et al.  Fault tolerance of neural associative memories , 1989 .

[22]  Justin Judkins,et al.  Real-Time BIST Detector for BGA Faults in Field Programmable Gate Arrays (FPGAs) , 2006 .