DRIS-3: Deep Neural Network Reliability Improvement Scheme in 3D Die-Stacked Memory based on Fault Analysis

Various studies have been carried out to improve the operational efficiency of the Deep Neural Networks (DNNs). However, the importance of the reliability in DNNs has generally been overlooked. As the underlying semiconductor technology decreases in reliability, the probability that some components of computing devices fail also increases, preventing high accuracy in DNN operations. To achieve high accuracy, ensuring operational reliability, even if faults occur, is necessary.In this paper, we introduce a DNN reliability improvement scheme in 3D die-stacked memory called DRIS-3, based on the correlation between the faults in weights and an accuracy loss. We analyze the fault characteristics of conventional DNN models to find the bits that cause significant accuracy loss when faults are injected into weights. On the basis of the findings, we propose a reliability improvement structure which can reduce faults on the bits that must be protected for accuracy, considering asymmetric soft error rate (SER) per layer in 3D die-stacked memory. Experimental results show that with the proposed method, the fault tolerance is improved regardless of the type of model and the pruning applied. The fault tolerance based on bit error rate (BER) for a 1% accuracy loss is increased up to 104 times over the conventional model. CCS CONCEPTS • Computer systems organization → Neural networks; • Hardware → Fault tolerance;

[1]  Cristian Constantinescu,et al.  Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.

[2]  Guanpeng Li,et al.  Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Keith Kim,et al.  HBM (High Bandwidth Memory) DRAM Technology and Architecture , 2017, 2017 IEEE International Memory Workshop (IMW).

[4]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[5]  Gabriel H. Loh,et al.  Efficient RAS support for die-stacked DRAM , 2014, 2014 International Test Conference.

[6]  Moinuddin K. Qureshi,et al.  Citadel: Efficiently Protecting Stacked Memory from Large Granularity Failures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[7]  Caro Lucas,et al.  Relaxed Fault-Tolerant Hardware Implementation of Neural Networks in the Presence of Multiple Transient Errors , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Joon-Sung Yang,et al.  READ: Reliability Enhancement in 3D-Memory Exploiting Asymmetric SER Distribution , 2018, IEEE Transactions on Computers.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Gu-Yeon Wei,et al.  Ares: A framework for quantifying the resilience of deep neural networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Gabriel H. Loh A register-file approach for row buffer caches in die-stacked DRAMs , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[14]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[15]  Tino Heijmen,et al.  Analytical semi-empirical model for SER sensitivity estimation of deep-submicron CMOS circuits , 2005, 11th IEEE International On-Line Testing Symposium.

[16]  R. Baumann The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction , 2002, Digest. International Electron Devices Meeting,.

[17]  Tao Li,et al.  Microarchitecture soft error vulnerability characterization and mitigation under 3D integration technology , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.