论文信息 - Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators

Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators

The continued success of Deep Neural Networks (DNNs) in classification tasks has sparked a trend of accelerating their execution with specialized hardware. While published designs easily give an order of magnitude improvement over general-purpose hardware, few look beyond an initial implementation. This paper presents Minerva, a highly automated co-design approach across the algorithm, architecture, and circuit levels to optimize DNN hardware accelerators. Compared to an established fixed-point accelerator baseline, we show that fine-grained, heterogeneous datatype optimization reduces power by 1.5×; aggressive, inline predication and pruning of small activity values further reduces power by 2.0×; and active hardware fault detection coupled with domain-aware error mitigation eliminates an additional 2.7× through lowering SRAM voltages. Across five datasets, these optimizations provide a collective average of 8.1× power reduction over an accelerator baseline without compromising DNN model accuracy. Minerva enables highly accurate, ultra-low power DNN accelerators (in the range of tens of milliwatts), making it feasible to deploy DNNs in power-constrained IoT and mobile devices.

[1] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[2] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[3] J. L. Holt,et al. Back propagation simulations using limited precision calculations , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[4] Hiroshi Yamamoto,et al. Reduction of required precision bits for back-propagation applied to pattern recognition , 1993, IEEE Trans. Neural Networks.

[5] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[6] Wolfgang Maass,et al. Networks of Spiking Neurons: The Third Generation of Neural Network Models , 1996, Electron. Colloquium Comput. Complex..

[7] Thorsten Joachims,et al. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[8] Wofgang Maas,et al. Networks of spiking neurons: the third generation of neural network models , 1997 .

[9] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[10] Tom M. Mitchell,et al. Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[11] Denis J. Dean,et al. Comparison of neural networks and discriminant analysis in predicting forest cover types , 1998 .

[12] Henry Markram,et al. Neural Networks with Dynamic Synapses , 1998, Neural Computation.

[13] Sanjay Pant,et al. A self-tuning DVS processor using delay-error detection and correction , 2005, IEEE Journal of Solid-State Circuits.

[14] B.C. Paul,et al. Process variation in embedded memories: failure analysis and variation aware architecture , 2005, IEEE Journal of Solid-State Circuits.

[15] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[16] Marc'Aurelio Ranzato,et al. Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[17] Walter Senn,et al. Learning Real-World Stimuli in a Neural Network with Spike-Driven Synaptic Dynamics , 2007, Neural Computation.

[18] Ana Margarida de Jesus,et al. Improving Methods for Single-label Text Categorization , 2007 .

[19] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[20] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[21] M. Sachdev,et al. Design and Analysis of A 5.3-pJ 64-kb Gated Ground SRAM With Multiword ECC , 2009, IEEE Journal of Solid-State Circuits.

[22] Hojjat Adeli,et al. Spiking Neural Networks , 2009, Int. J. Neural Syst..

[23] David M. Bull,et al. RazorII: In Situ Error Detection and Correction for PVT and SER Tolerance , 2009, IEEE Journal of Solid-State Circuits.

[24] Klaus Kofler,et al. Performance and Scalability of GPU-Based Convolutional Neural Networks , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[25] Luca Maria Gambardella,et al. Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[26] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[27] Berin Martini,et al. Hardware accelerated convolutional neural networks for synthetic vision systems , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[28] Luca Maria Gambardella,et al. Convolutional Neural Network Committees for Handwritten Character Classification , 2011, 2011 International Conference on Document Analysis and Recognition.

[29] Keith A. Bowman,et al. Tunable Replica Bits for Dynamic Variation Tolerance in 8T SRAM Arrays , 2011, IEEE Journal of Solid-State Circuits.

[30] Paolo A. Aseron,et al. A 45 nm Resilient Microprocessor Core for Dynamic Variation Tolerance , 2011, IEEE Journal of Solid-State Circuits.

[31] Dan Grossman,et al. EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[32] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[33] David Blaauw,et al. A Power-Efficient 32 bit ARM Processor Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT Variation , 2011, IEEE Journal of Solid-State Circuits.

[34] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .

[35] Andrew S. Cassidy,et al. Building block of a programmable neuromorphic substrate: A digital neurosynaptic core , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[36] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[37] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[38] Olivier Temam,et al. A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[39] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[41] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.

[42] Andrew S. Cassidy,et al. Cognitive computing systems: Algorithms and applications for networks of neurosynaptic cores , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[43] Izzat Darwazeh,et al. Circuit-Level Timing Error Tolerance for Low-Power DSP Filters and Transforms , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[44] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[45] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[46] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[47] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[48] Gu-Yeon Wei,et al. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[49] Shidhartha Das,et al. A Low-Power 1-GHz Razor FIR Accelerator With Time-Borrow Tracking Pipeline and Approximate Error Correction in 65-nm CMOS , 2014, IEEE Journal of Solid-State Circuits.

[50] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[51] Quan Chen,et al. DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[52] Zhengya Zhang,et al. A 640M pixel/s 3.65mW sparse event-driven neuromorphic object recognition processor with on-chip learning , 2015, 2015 Symposium on VLSI Circuits (VLSI Circuits).

[53] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[54] Qiang Xu,et al. ApproxANN: An approximate computing framework for artificial neural network , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[55] Geoffrey E. Hinton,et al. Guest Editorial: Deep Learning , 2015, International Journal of Computer Vision.

[56] Huawei Li,et al. Retraining-based timing error mitigation for hardware neural networks , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[57] Paolo A. Aseron,et al. 4.7 A 409GOPS/W adaptive and resilient domino register file in 22nm tri-gate CMOS featuring in-situ timing margin and error detection for tolerance to within-die variation, voltage droop, temperature and aging , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[58] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[59] Saibal Mukhopadhyay,et al. A power-aware digital feedforward neural network platform with backpropagation driven approximate synapses , 2015, 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[60] Steve B. Furber,et al. Scalable energy-efficient, low-latency implementations of trained spiking Deep Belief Networks on SpiNNaker , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[61] Vivienne Sze,et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[62] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[63] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[64] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[65] V. Sze,et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.