WMixNet: An Energy-Scalable and Computationally Lightweight Deep Learning Accelerator

In this paper, we present a lightweight CNN model named MixNet which is easily scalable to different energy requirements in embedded platforms. The MixNet model uses two extreme bit-precisions that efficiently balances the model accuracy and energy consumption. The energy consumption in processing MixNet is managed by controlling the ratio between high-precision (16bit) and low-precision (1bit) paths. Since only two bit-precisions are required in designing a hardware accelerator, the control logic becomes simpler compared to other multi-precision accelerators. In addition, a reconfigurable multiplier is proposed to enable highly parallel MixNet computations for faster prediction and/or training. Overall, the energy efficiency in terms of run-time per unit power improves by 1.75 ~1.94 × over the recently proposed reduced-precision CNN model.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[3]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[5]  Eunhyeok Park,et al.  Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[6]  Saibal Mukhopadhyay,et al.  A Power-Aware Digital Multilayer Perceptron Accelerator with On-Chip Training Based on Approximate Computing , 2017, IEEE Transactions on Emerging Topics in Computing.

[7]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[8]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Xiaoli Liu,et al.  Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe , 2018, ReQuEST@ASPLOS.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Hadi Esmaeilzadeh,et al.  Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[14]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[15]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Eriko Nurvitadhi,et al.  WRPN: Wide Reduced-Precision Networks , 2017, ICLR.