Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators

The continued success of Deep Neural Networks (DNNs) in classification tasks has sparked a trend of accelerating their execution with specialized hardware. While published designs easily give an order of magnitude improvement over general-purpose hardware, few look beyond an initial implementation. This paper presents Minerva, a highly automated co-design approach across the algorithm, architecture, and circuit levels to optimize DNN hardware accelerators. Compared to an established fixed-point accelerator baseline, we show that fine-grained, heterogeneous datatype optimization reduces power by 1.5×; aggressive, inline predication and pruning of small activity values further reduces power by 2.0×; and active hardware fault detection coupled with domain-aware error mitigation eliminates an additional 2.7× through lowering SRAM voltages. Across five datasets, these optimizations provide a collective average of 8.1× power reduction over an accelerator baseline without compromising DNN model accuracy. Minerva enables highly accurate, ultra-low power DNN accelerators (in the range of tens of milliwatts), making it feasible to deploy DNNs in power-constrained IoT and mobile devices.

[1]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[2]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[3]  J. L. Holt,et al.  Back propagation simulations using limited precision calculations , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[4]  Hiroshi Yamamoto,et al.  Reduction of required precision bits for back-propagation applied to pattern recognition , 1993, IEEE Trans. Neural Networks.

[5]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[6]  Wolfgang Maass,et al.  Networks of Spiking Neurons: The Third Generation of Neural Network Models , 1996, Electron. Colloquium Comput. Complex..

[7]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[8]  Wofgang Maas,et al.  Networks of spiking neurons: the third generation of neural network models , 1997 .

[9]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[10]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[11]  Denis J. Dean,et al.  Comparison of neural networks and discriminant analysis in predicting forest cover types , 1998 .

[12]  Henry Markram,et al.  Neural Networks with Dynamic Synapses , 1998, Neural Computation.

[13]  Sanjay Pant,et al.  A self-tuning DVS processor using delay-error detection and correction , 2005, IEEE Journal of Solid-State Circuits.

[14]  B.C. Paul,et al.  Process variation in embedded memories: failure analysis and variation aware architecture , 2005, IEEE Journal of Solid-State Circuits.

[15]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[16]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[17]  Walter Senn,et al.  Learning Real-World Stimuli in a Neural Network with Spike-Driven Synaptic Dynamics , 2007, Neural Computation.

[18]  Ana Margarida de Jesus,et al.  Improving Methods for Single-label Text Categorization , 2007 .

[19]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[20]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[21]  M. Sachdev,et al.  Design and Analysis of A 5.3-pJ 64-kb Gated Ground SRAM With Multiword ECC , 2009, IEEE Journal of Solid-State Circuits.

[22]  Hojjat Adeli,et al.  Spiking Neural Networks , 2009, Int. J. Neural Syst..

[23]  David M. Bull,et al.  RazorII: In Situ Error Detection and Correction for PVT and SER Tolerance , 2009, IEEE Journal of Solid-State Circuits.

[24]  Klaus Kofler,et al.  Performance and Scalability of GPU-Based Convolutional Neural Networks , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[25]  Luca Maria Gambardella,et al.  Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[26]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[27]  Berin Martini,et al.  Hardware accelerated convolutional neural networks for synthetic vision systems , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[28]  Luca Maria Gambardella,et al.  Convolutional Neural Network Committees for Handwritten Character Classification , 2011, 2011 International Conference on Document Analysis and Recognition.

[29]  Keith A. Bowman,et al.  Tunable Replica Bits for Dynamic Variation Tolerance in 8T SRAM Arrays , 2011, IEEE Journal of Solid-State Circuits.

[30]  Paolo A. Aseron,et al.  A 45 nm Resilient Microprocessor Core for Dynamic Variation Tolerance , 2011, IEEE Journal of Solid-State Circuits.

[31]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[32]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[33]  David Blaauw,et al.  A Power-Efficient 32 bit ARM Processor Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT Variation , 2011, IEEE Journal of Solid-State Circuits.

[34]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[35]  Andrew S. Cassidy,et al.  Building block of a programmable neuromorphic substrate: A digital neurosynaptic core , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[36]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[37]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[38]  Olivier Temam,et al.  A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[39]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[41]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[42]  Andrew S. Cassidy,et al.  Cognitive computing systems: Algorithms and applications for networks of neurosynaptic cores , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[43]  Izzat Darwazeh,et al.  Circuit-Level Timing Error Tolerance for Low-Power DSP Filters and Transforms , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[44]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[45]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[46]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[47]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[48]  Gu-Yeon Wei,et al.  Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[49]  Shidhartha Das,et al.  A Low-Power 1-GHz Razor FIR Accelerator With Time-Borrow Tracking Pipeline and Approximate Error Correction in 65-nm CMOS , 2014, IEEE Journal of Solid-State Circuits.

[50]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[51]  Quan Chen,et al.  DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[52]  Zhengya Zhang,et al.  A 640M pixel/s 3.65mW sparse event-driven neuromorphic object recognition processor with on-chip learning , 2015, 2015 Symposium on VLSI Circuits (VLSI Circuits).

[53]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[54]  Qiang Xu,et al.  ApproxANN: An approximate computing framework for artificial neural network , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[55]  Geoffrey E. Hinton,et al.  Guest Editorial: Deep Learning , 2015, International Journal of Computer Vision.

[56]  Huawei Li,et al.  Retraining-based timing error mitigation for hardware neural networks , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[57]  Paolo A. Aseron,et al.  4.7 A 409GOPS/W adaptive and resilient domino register file in 22nm tri-gate CMOS featuring in-situ timing margin and error detection for tolerance to within-die variation, voltage droop, temperature and aging , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[58]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[59]  Saibal Mukhopadhyay,et al.  A power-aware digital feedforward neural network platform with backpropagation driven approximate synapses , 2015, 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[60]  Steve B. Furber,et al.  Scalable energy-efficient, low-latency implementations of trained spiking Deep Belief Networks on SpiNNaker , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[61]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[62]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[63]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[64]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[65]  V. Sze,et al.  Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.