Lightweight Deep Neural Network Accelerators Using Approximate SW/HW Techniques

Deep neural networks (DNNs) provide state-of-the-art accuracy performances in many application domains, such as computer vision and speech recognition. At the same time, DNNs require millions of expensive floating-point operations to process each input, which limit their applicability to resource-constrained systems that are limited in hardware design area or power consumption. Our goal is to devise lightweight, approximate accelerators for DNN accelerations that use less hardware resources with negligible reduction in accuracy. To simplify the hardware requirements, we analyze a spectrum of data precision methods ranging from fixed-point, dynamic fixed-point, powers-of-two to binary data precision. In conjunction, we provide new training methods to compensate for the simpler hardware resources. To boost the accuracy of the proposed lightweight accelerators, we describe ensemble processing techniques that use an ensemble of lightweight DNN accelerators to achieve the same or better accuracy than the original floating-point accelerator, while still using much less hardware resources. Using 65 nm technology libraries and industrial-strength design flow, we demonstrate a custom hardware accelerator design and training procedure which achieve low-power, low-latency while incurring insignificant accuracy degradation. We evaluate our design and technique on the CIFAR-10 and ImageNet datasets and show that significant reduction in power and inference latency is realized.

[1]  Sherief Reda,et al.  Hardware-software codesign of accurate, multiplier-free Deep Neural Networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[2]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[3]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[4]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[6]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[7]  Yoshua Bengio,et al.  Neural Networks with Few Multiplications , 2015, ICLR.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Sherief Reda,et al.  Understanding the impact of precision quantization on the accuracy and energy of neural networks , 2016, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[10]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[11]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[12]  Muhammad Shafique,et al.  Adaptive and Energy-Efficient Architectures for Machine Learning: Challenges, Opportunities, and Research Roadmap , 2017, 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[13]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[14]  Philipp Gysel,et al.  Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[15]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[16]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[20]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[21]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[22]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[23]  Yoshua Bengio,et al.  Low precision arithmetic for deep learning , 2014, ICLR.

[24]  Sherief Reda,et al.  Runtime configurable deep neural networks for energy-accuracy trade-off , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).