Efficient Deep Convolutional Neural Networks Accelerator without Multiplication and Retraining

Recently, low-precision weight method has been considered as a promising scheme to efficiently implement inference of deep convolutional neural networks (DCNN). But it suffers from expensive retraining cost and accuracy degradation. In this paper, a low-bit and retraining-free quantization method, which enables DCNNs to deal inference with only shift and add operations, is proposed. The efficiency is demonstrated in terms of power consumption and chip area. Huffman coding is adopted for further compression. Then by exploring two-level systolic, an efficient hardware accelerator is introduced with respect to the given quantization strategy. Experiment results show that our method achieves higher accuracy than other low-precision networks without retraining process on ImageNet. 5× to 8× compression is obtained on popular models compared to full-precision counterparts. Furthermore, hardware implementation indicates good reduction of slices whereas maintaining throughput.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[3]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4]  Chuan Zhang,et al.  Neural networks: Efficient implementations and applications , 2017, 2017 IEEE 12th International Conference on ASIC (ASICON).

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Luca Rigazio,et al.  ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks , 2017, ArXiv.

[7]  Xiaohu You,et al.  Efficient fast convolution architectures for convolutional neural network , 2017, 2017 IEEE 12th International Conference on ASIC (ASICON).

[8]  Xiaohu You,et al.  Improved polar decoder based on deep learning , 2017, 2017 IEEE International Workshop on Signal Processing Systems (SiPS).

[9]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[10]  H. T. Kung Why systolic architectures? , 1982, Computer.

[11]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[12]  Xiaohu You,et al.  Using Fermat number transform to accelerate convolutional neural network , 2017, 2017 IEEE 12th International Conference on ASIC (ASICON).

[13]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Daisuke Miyashita,et al.  LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[18]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.