Dynamic Bit-width Reconfiguration for Energy-Efficient Deep Learning Hardware

Deep learning models have reached state of the art performance in many machine learning tasks. Benefits in terms of energy, bandwidth, latency, etc., can be obtained by evaluating these models directly within Internet of Things end nodes, rather than in the cloud. This calls for implementations of deep learning tasks that can run in resource limited environments with low energy footprints. Research and industry have recently investigated these aspects, coming up with specialized hardware accelerators for low power deep learning. One effective technique adopted in these devices consists in reducing the bit-width of calculations, exploiting the error resilience of deep learning. However, bit-widths are tipically set statically for a given model, regardless of input data. Unless models are retrained, this solution invariably sacrifices accuracy for energy efficiency. In this paper, we propose a new approach for implementing input-dependant dynamic bit-width reconfiguration in deep learning accelerators. Our method is based on a fully automatic characterization phase, and can be applied to popular models without retraining. Using the energy data from a real deep learning accelerator chip, we show that 50% energy reduction can be achieved with respect to a static bit-width selection, with less than 1% accuracy loss.

[1]  Hoi-Jun Yoo,et al.  Low-Power Convolutional Neural Network Processor for a Face-Recognition System , 2017, IEEE Micro.

[2]  Qiang Xu,et al.  ApproxANN: An approximate computing framework for artificial neural network , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Jie Liu,et al.  Scalable-effort classifiers for energy-efficient machine learning , 2015, DAC.

[4]  Philipp Gysel,et al.  Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[5]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Kaushik Roy,et al.  AxNN: Energy-efficient neuromorphic systems using approximate computing , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[8]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[9]  Sungroh Yoon,et al.  Big/little deep neural network for ultra low power inference , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[10]  Sherief Reda,et al.  Runtime configurable deep neural networks for energy-accuracy trade-off , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[11]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12]  Marian Verhelst,et al.  DVAFS: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[13]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[14]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[15]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[16]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[17]  Massimo Alioto,et al.  Energy-quality scalable adaptive VLSI circuits and systems beyond approximate computing , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[18]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[19]  Luca Benini,et al.  YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[20]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[21]  Soheil Ghiasi,et al.  Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.