SalvageDNN: salvaging deep neural network accelerators with permanent faults through saliency-driven fault-aware mapping

Deep neural networks (DNNs) have proliferated in most of the application domains that involve data processing, predictive analysis and knowledge inference. Alongside the need for developing highly performance-efficient DNN accelerators, there is an utmost need to improve the yield of the manufacturing process in order to reduce the per unit cost of the DNN accelerators. To this end, we present ‘SalvageDNN’, a methodology to enable reliable execution of DNNs on the hardware accelerators with permanent faults (typically due to imperfect manufacturing processes). It employs a fault-aware mapping of different parts of a given DNN on the hardware accelerator (subjected to faults) by leveraging the saliency of the DNN parameters and the fault map of the underlying processing hardware. We also present novel modifications in a systolic array design to further improve the yield of the accelerators while ensuring reliable DNN execution using ‘SalvageDNN’ and negligible overheads in terms of area, power/energy and performance. This article is part of the theme issue ‘Harmonizing energy-autonomous computing and intelligence’.

[1]  Keiichi Higeta,et al.  Built-in self-test for GHz embedded SRAMs using flexible pattern generator and new repair algorithm , 1999, International Test Conference 1999. Proceedings (IEEE Cat. No.99CH37034).

[2]  Vivienne Sze,et al.  Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[3]  Israel Koren,et al.  Defect tolerance in VLSI circuits: techniques and yield analysis , 1998, Proc. IEEE.

[4]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[5]  Shantanu Gupta,et al.  Architectural core salvaging in a multi-core processor for hard-error tolerance , 2009, ISCA '09.

[6]  Doug Burger,et al.  Exploiting microarchitectural redundancy for defect tolerance , 2003, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[7]  K. Saluja,et al.  A tutorial on built-in self-test. 2. Applications , 1993, IEEE Design & Test of Computers.

[8]  Norbert Wehn,et al.  Reliable on-chip systems in the nano-era: Lessons learnt and future trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Vishwani D. Agrawal,et al.  A Tutorial on Built-in Self-Test. I. Principles , 1993, IEEE Des. Test Comput..

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[12]  T. N. Vijaykumar,et al.  Rescue: a microarchitecture for testability and defect tolerance , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[13]  Cristian Constantinescu,et al.  Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.

[14]  Qiang Xu,et al.  Defect Tolerance in Homogeneous Manycore Processors Using Core-Level Redundancy with Unified Topology , 2008, 2008 Design, Automation and Test in Europe.

[15]  Sudhakar M. Reddy,et al.  On the Design of Fault-Tolerant Two-Dimensional Systolic Arrays for Yield Enhancement , 1989, IEEE Trans. Computers.

[16]  Jeff Zhang,et al.  Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator , 2018, 2018 IEEE 36th VLSI Test Symposium (VTS).

[17]  Jacob A. Abraham,et al.  Fault Tolerance Techniques for Systolic Arrays , 1987, Computer.

[18]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[19]  John Wawrzynek,et al.  On the opportunity to improve system yield with multi-core architectures , 2007 .

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[22]  Muhammad Shafique,et al.  MPNA: A Massively-Parallel Neural Array Accelerator with Dataflow Optimization for Convolutional Neural Networks , 2018, ArXiv.

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  H. T. Kung,et al.  Fault-Tolerance and Two-Level Pipelining in VLSI Systolic Arrays , 1983 .

[25]  Vincent Anthony Mabert,et al.  Tutorial and Survey , 1972 .

[26]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.