ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators

Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling of high-performance DNN accelerators without compromising classification accuracy even in the presence of high timing error rates. Using post-synthesis timing simulations of a DNN accelerator modeled on the Google TPU, we show that Thundervolt enables between 34%–57% energy savings on state-of-the-art speech and image recognition benchmarks with less than 1% loss in classification accuracy and no performance loss. Further, we show that Thundervolt is synergistic with and can further increase the energy efficiency of commonly used run-time DNN pruning techniques like Zero-Skip.

[1]  Sherief Reda,et al.  Understanding the impact of precision quantization on the accuracy and energy of neural networks , 2016, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[2]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[3]  Manoj Alwani,et al.  Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Siddharth Garg,et al.  BandiTS: Dynamic timing speculation using multi-armed bandit based optimization , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[5]  Rajesh K. Gupta,et al.  An assessment of vulnerability of hardware neural networks to dynamic voltage and temperature variations , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[6]  Deming Chen,et al.  CCP: common case promotion for improved timing error resilience with energy efficiency , 2012, ISLPED '12.

[7]  Kiyoung Choi,et al.  Efficient FPGA acceleration of Convolutional Neural Networks using logical-3D compute array , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[8]  Gu-Yeon Wei,et al.  14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[9]  Naresh R. Shanbhag,et al.  Soft digital signal processing , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[10]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[11]  Jeff Zhang,et al.  Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator , 2018, 2018 IEEE 36th VLSI Test Symposium (VTS).

[12]  Naoya Onizawa,et al.  VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  David Blaauw,et al.  Razor: circuit-level correction of timing errors for low-power operation , 2004, IEEE Micro.

[14]  Srihari Cadambi,et al.  A Massively Parallel Coprocessor for Convolutional Neural Networks , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[15]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[16]  Qinru Qiu,et al.  SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing , 2016, ASPLOS.

[17]  Ana Margarida de Jesus,et al.  Improving Methods for Single-label Text Categorization , 2007 .

[18]  Kaushik Roy,et al.  Multiplier-less Artificial Neurons exploiting error resiliency for energy-efficient neural computing , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[20]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[21]  Mehdi Kamal,et al.  Lifetime improvement by exploiting aggressive voltage scaling during runtime of error-resilient applications , 2017, Integr..

[22]  Natalie D. Enright Jerger,et al.  Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[23]  David Blaauw,et al.  Bubble Razor: An architecture-independent approach to timing-error detection and correction , 2012, 2012 IEEE International Solid-State Circuits Conference.

[24]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[25]  Dongyoung Kim,et al.  ZeNA: Zero-Aware Neural Network Accelerator , 2018, IEEE Design & Test.

[26]  Josep Torrellas,et al.  Blueshift: Designing processors for timing speculation from the ground up. , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[27]  Qiang Xu,et al.  Re-synthesis for cost-efficient circuit-level timing speculation , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[28]  Hu Chen,et al.  Synergistic timing speculation for multi-threaded programs , 2016, DAC.

[29]  David M. Bull,et al.  RazorII: In Situ Error Detection and Correction for PVT and SER Tolerance , 2009, IEEE Journal of Solid-State Circuits.

[30]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[31]  Rob A. Rutenbar,et al.  Error Resilient and Energy Efficient MRF Message-Passing-Based Stereo Matching , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[32]  Shohaib Aboobacker RAZOR: circuit-level correction of timing errors for low-power operation , 2011 .

[33]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[34]  Robert C. Aitken,et al.  TIMBER: Time borrowing and error relaying for online timing error resilience , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[35]  Peng Zhang,et al.  Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[36]  Srihari Cadambi,et al.  A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.

[37]  Izzat Darwazeh,et al.  Circuit-Level Timing Error Tolerance for Low-Power DSP Filters and Transforms , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[38]  Andreas Gerstlauer,et al.  High-level synthesis of approximate hardware under joint precision and voltage scaling , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[39]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[40]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[41]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[42]  Alex Krizhevsky,et al.  One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[43]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[44]  Scott A. Mahlke,et al.  Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[45]  H. T. Kung Why systolic architectures? , 1982, Computer.