Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors

Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, especially for energy-constrained inference tasks at the edge using wearable and IoT devices. One promising approach to alleviate the computational challenges is implementing DNNs using low-precision fixed point (<16 bits) representation. However, the quantization error inherent in any Fixed Point (FxP) implementation limits the choice of bit-widths to maintain application-level accuracy. Prior efforts recommend increasing the network size and/or re-training the DNN to minimize loss due to quantization, albeit with limited success.Complementary to the above approaches, we present Compensated-DNN, wherein we propose to dynamically compensate the error introduced due to quantization during execution. To this end, we introduce a new fixed-point representation viz. Fixed Point with Error Compensation (FPEC). The bits in FPEC are split between computation bits vs. compensation bits. The computation bits use conventional FxP notation to represent the number at low-precision. On the other hand, the compensation bits (1 or 2 bits at most) explicitly capture an estimate (direction and magnitude) of the quantization error in the representation. For a given word length, since FPEC uses fewer computation bits compared to FxP representation, we achieve a near-quadratic improvement in energy in the multiply-and-accumulate (MAC) operations. The compensation bits are simultaneously used by a low-overhead sparse compensation scheme to estimate the error accrued during MAC operations, which is then added to the MAC output to minimize the impact of quantization. We build compensated-DNNs for 7 popular image recognition benchmarks with 0.05–20.5 million neurons and 0.01–15.5 billion connections. Based on gate-level analysis at 14nm technology, we achieve 2.65 × –4.88 × and 1.13 × –1.7 × improvement in energy compared to 16-bit and 8-bit FxP implementations respectively, while maintaining <0.5% loss in classification accuracy.

[1]  Anand Raghunathan,et al.  SparCE: Sparsity Aware General-Purpose Core Extensions to Accelerate Deep Neural Networks , 2017, IEEE Transactions on Computers.

[2]  Pradeep Dubey,et al.  SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[3]  Sherief Reda,et al.  Hardware-software codesign of accurate, multiplier-free Deep Neural Networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[4]  Sherief Reda,et al.  Understanding the impact of precision quantization on the accuracy and energy of neural networks , 2016, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[5]  Frédéric Pétrot,et al.  Ternary neural networks for resource-efficient AI applications , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[6]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[7]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[8]  Natalie D. Enright Jerger,et al.  Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks , 2016, ICS.

[9]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[11]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[12]  Yoshua Bengio,et al.  Neural Networks with Few Multiplications , 2015, ICLR.

[13]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[14]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[15]  Daniel Soudry,et al.  Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation , 2015, ArXiv.

[16]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[17]  Kaushik Roy,et al.  AxNN: Energy-efficient neuromorphic systems using approximate computing , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  方华 google,我,萨娜 , 2006 .