HEIF: Highly Efficient Stochastic Computing-Based Inference Framework for Deep Neural Networks

Deep convolutional neural networks (DCNNs) are one of the most promising deep learning techniques and have been recognized as the dominant approach for almost all recognition and detection tasks. The computation of DCNNs is memory intensive due to large feature maps and neuron connections, and the performance highly depends on the capability of hardware resources. With the recent trend of wearable devices and Internet of Things, it becomes desirable to integrate the DCNNs onto embedded and portable devices that require low power and energy consumptions and small hardware footprints. Recently stochastic computing (SC)-DCNN demonstrated that SC as a low-cost substitute to binary-based computing radically simplifies the hardware implementation of arithmetic units and has the potential to satisfy the stringent power requirements in embedded devices. In SC, many arithmetic operations that are resource-consuming in binary designs can be implemented with very simple hardware logic, alleviating the extensive computational complexity. It offers a colossal design space for integration and optimization due to its reduced area and soft error resiliency. In this paper, we present HEIF, a highly efficient SC-based inference framework of the large-scale DCNNs, with broad applications including (but not limited to) LeNet-5 and AlexNet, that achieves high energy efficiency and low area/hardware cost. Compared to SC-DCNN, HEIF features: 1) the first (to the best of our knowledge) SC-based rectified linear unit activation function to catch up with the recent advances in software models and mitigate degradation in application-level accuracy; 2) the redesigned approximate parallel counter and optimized stochastic multiplication using transmission gates and inverse mirror adders; and 3) the new optimization of weight storage using clustering. Most importantly, to achieve maximum energy efficiency while maintaining acceptable accuracy, HEIF considers holistic optimizations on cascade connection of function blocks in DCNN, pipelining technique, and bit-stream length reduction. Experimental results show that in large-scale applications HEIF outperforms previous SC-DCNN by the throughput of $4.1\times $ , by area efficiency of up to $6.5\times $ , and achieves up to ${5.6\times }$ energy improvement.

[1]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2]  Dirk Schulz,et al.  Real time interaction with mobile robots using hand gestures , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[3]  Natalie D. Enright Jerger,et al.  The Bunker Cache for spatio-value approximation , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[5]  Hadi Esmaeilzadeh,et al.  Towards Statistical Guarantees in Controlling Quality Tradeoffs for Approximate Acceleration , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[6]  Brian R. Gaines,et al.  Stochastic Computing Systems , 1969 .

[7]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[8]  Razvan Pascanu,et al.  Theano: Deep Learning on GPUs with Python , 2012 .

[9]  Frédéric Maire,et al.  A Convolutional Neural Network for Automatic Analysis of Aerial Imagery , 2014, 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[10]  David Harris,et al.  CMOS VLSI Design: A Circuits and Systems Perspective , 2004 .

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Margaret Martonosi,et al.  COATCheck: Verifying Memory Ordering at the Hardware-OS Interface , 2016, ASPLOS.

[13]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[14]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[17]  Hadi Esmaeilzadeh,et al.  AxGames: Towards Crowdsourcing Quality Target Determination in Approximate Computing , 2016, ASPLOS.

[18]  Kiyoung Choi,et al.  An energy-efficient random number generator for stochastic circuits , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[19]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[20]  Shih-Chii Liu,et al.  Minitaur, an Event-Driven FPGA-Based Spiking Network Accelerator , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[21]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Fernando A. Mujica,et al.  An Empirical Evaluation of Deep Learning on Highway Driving , 2015, ArXiv.

[23]  Kiyoung Choi,et al.  Approximate de-randomizer for stochastic circuits , 2015, 2015 International SoC Design Conference (ISOCC).

[24]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[25]  G. Stoica,et al.  High performance CUDA based CNN image processor , 2015 .

[26]  Pierre Baldi,et al.  Learning Activation Functions to Improve Deep Neural Networks , 2014, ICLR.

[27]  Soheil Ghiasi,et al.  Design space exploration of FPGA-based Deep Convolutional Neural Networks , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[28]  John P. Hayes,et al.  Survey of Stochastic Computing , 2013, TECS.

[29]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[30]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[31]  Howard C. Card,et al.  Stochastic Neural Computation I: Computational Elements , 2001, IEEE Trans. Computers.

[32]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Xuehai Zhou,et al.  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[34]  P. Szolgay,et al.  Analysis of a GPU based CNN implementation , 2012, 2012 13th International Workshop on Cellular Nanoscale Networks and their Applications.

[35]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[36]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[37]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[38]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[39]  Feng Ran,et al.  A hardware implementation of a radial basis function neural network using stochastic logic , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[40]  David J. Lilja,et al.  Using stochastic computing to implement digital image processing algorithms , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[41]  Narayanan Vijaykrishnan,et al.  Nonvolatile Processor Architectures: Efficient, Reliable Progress with Unstable Power , 2016, IEEE Micro.

[42]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[44]  Thomas Plötz,et al.  Deep, Convolutional, and Recurrent Models for Human Activity Recognition Using Wearables , 2016, IJCAI.

[45]  Dharmendra S. Modha,et al.  Backpropagation for Energy-Efficient Neuromorphic Computing , 2015, NIPS.

[46]  Karin Strauss,et al.  Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .

[47]  Natalie D. Enright Jerger,et al.  The Anytime Automaton , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[48]  Kiyoung Choi,et al.  Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[49]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[50]  Qinru Qiu,et al.  SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing , 2016, ASPLOS.

[51]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[52]  Steve B. Furber,et al.  Scalable energy-efficient, low-latency implementations of trained spiking Deep Belief Networks on SpiNNaker , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[53]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[54]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[55]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[56]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.