A Hybrid Computing Architecture for Fault-tolerant Deep Learning Accelerators

Regular 2D computing array is widely utilized for the processing of the major neural network operations in many deep learning accelerators (DLAs). Hardware failures on the array can lead to considerable computing errors and prediction accuracy loss. Prior works proposed to add homogeneous redundant PEs to each row or column of the regular computing array to mitigate faulty PEs, but they may fail to recover the computing array from faults when the number of faulty PEs in a row or column exceeds the number of redundant PEs in the corresponding row or column. The problem gets worse when the faults are not evenly distributed across the computing array. To address the problem, we propose a hybrid computing architecture (HCA) for fault-tolerant DLAs. Instead of adding homogeneous redundant PEs to the regular computing array of DLAs, it has a dot-production processing unit (DPPU) to recompute the operations that are mapped to the faulty PEs concurrently without performance penalty under moderate fault injection. Even under high fault injection, HCA can be degraded smoothly and remains functional. In addition, DPPU exploits the parallelism within each operation and processes the network operations sequentially, so it can tolerate faulty PEs in arbitrary locations and ensures steady performance under distinct fault distributions. According to our experiments, HCA shows significantly higher reliability and performance under various fault injection with comparable chip area penalty compared to the conventional redundancy approaches.

[1]  C.H. Stapper,et al.  Integrated circuit yield statistics , 1983, Proceedings of the IEEE.

[2]  Dhiraj K. Pradhan,et al.  Modeling Defect Spatial Distribution , 1989, IEEE Trans. Computers.

[3]  Liang Chang,et al.  Optimal Reconfiguration of High-Performance VLSI Subarrays with Network Flow , 2016, IEEE Transactions on Parallel and Distributed Systems.

[4]  Masaru Fukushi,et al.  A Built-in Circuit for Self-Repairing Mesh-Connected Processor Arrays with Spares on Diagonal , 2017, 2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC).

[5]  Huawei Li,et al.  Retraining-based timing error mitigation for hardware neural networks , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Ying Liu,et al.  Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving , 2019, Proceedings.

[7]  Huawei Li,et al.  Resilience-Aware Frequency Tuning for Neural-Network-Based Approximate Computing Chips , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Huawei Li,et al.  Frequency scheduling for resilient chip multi-processors operating at Near Threshold Voltage , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[9]  Ying Wang,et al.  Resilient Neural Network Training for Accelerators with Computing Errors , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[10]  Ying Wang,et al.  Squeezing the Last MHz for CNN Acceleration on FPGAs , 2019, 2019 IEEE International Test Conference in Asia (ITC-Asia).

[11]  Itsuo Takanami,et al.  Fault-Tolerant Processor Arrays Based on the 1½-Track Switches with Flexible Spare Distributions , 2000, IEEE Trans. Computers.

[12]  Itsuo Takanami,et al.  A Built-in Circuit for Self-Repairing Mesh-Connected Processor Arrays by Direct Spare Replacement , 2012, 2012 IEEE 18th Pacific Rim International Symposium on Dependable Computing.

[13]  Xiaowei Li,et al.  C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[14]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Sparsh Mittal,et al.  A survey on modeling and improving reliability of DNN algorithms and accelerators , 2020, J. Syst. Archit..

[16]  Siddharth Garg,et al.  Fault-Tolerant Systolic Array Based Accelerators for Deep Neural Network Execution , 2019, IEEE Design & Test.

[17]  Anastasios Tefas,et al.  Human crowd detection for drone flight safety using convolutional neural networks , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[18]  Smaïl Niar,et al.  A Reliability Study on CNNs for Critical Embedded Systems , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[19]  Itsuo Takanami,et al.  An Analysis for Fault-Tolerant 3D Processor Arrays Using 1.5-Track Switches , 2008, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[20]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[21]  Matthew Mattina,et al.  SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.

[22]  Caro Lucas,et al.  Relaxed Fault-Tolerant Hardware Implementation of Neural Networks in the Presence of Multiple Transient Errors , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[24]  I. Takanami,et al.  A neural algorithm for reconstructing mesh-connected processor arrays using single-track switches , 1995, Proceedings IEEE International Conference on Wafer Scale Integration (ICWSI).

[25]  Xuan Zhang,et al.  Joint Design of Training and Hardware Towards Efficient and Accuracy-Scalable Neural Network Inference , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[26]  電子情報通信学会 IEICE transactions on fundamentals of electronics, communications and computer sciences , 1992 .

[27]  Sergey P. Orlov,et al.  Intelligent Information Processing System for Monitoring Rail Tracks , 2019, 2019 III International Conference on Control in Technical Systems (CTS).

[28]  Jeff Zhang,et al.  Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator , 2018, 2018 IEEE 36th VLSI Test Symposium (VTS).