Flexible Multi-Precision Accelerator Design for Deep Convolutional Neural Networks Considering Both Data Computation and Communication

Due to the quick advance in deep convolutional neural networks (CNN), hardware acceleration of convolution computations for edge devices is crucial for many artificial intelligence applications. This paper presents a CNN accelerator design that supports various CNN filter sizes/strides and different bit precisions. In particular, we analyze the latency of data communication and computation and determine the proper precision that maximizes the utilization efficiency of available hardware resource. The proposed design supports data precision of 8-bit and 16-bit, and weight precision of 2-bit, 4-bit, 8-bit, and 16-bit for popular CNN models. It can effectively increase speed performance of low-precision computation by exploiting the additional parallelism.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[3]  Hoi-Jun Yoo,et al.  UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision , 2019, IEEE Journal of Solid-State Circuits.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Hadi Esmaeilzadeh,et al.  Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Patrick Judd,et al.  Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Leibo Liu,et al.  A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications , 2018, IEEE Journal of Solid-State Circuits.