Efficient Reconfigurable Hardware Core for Convolutional Neural Networks

The Convolutional Neural Network (CNN) is one of the most promising methods in modern machine learning, but its intensive requirement of computing resources limits the application on embedded systems. Since the energy consumption of a CNN is dominated by convolutions, methods such as Winograd and fast FIR algorithms (FFA) are introduced to reduce the computation complexity of convolutions. However, hardware implementations of these algorithms suffer from the reduction of efficiency when processing different CNN models, because their fixed architectures can not efficiently support all sizes of convolution kernels. In this paper, for the first time, we propose an FFA-based all-size Reconfigurable Convolution Core (RCC) to tackle this problem. The proposed RCC can efficiently perform 5 mainstream sizes of convolution kernels, while achieving significant computation complexity reduction compared with the conventional convolution architecture. Considering the strict resource budget of embedded systems, we explore a large design space to obtain an optimal tradeoff between hardware utilization and reconfigurability. Moreover, we propose an overlapping dataflow scheme for the RCC to reduce the workload of the communication bandwidth. The synthesis result shows that the proposed design can run over 600MHz.

[1]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[2]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[3]  Shengen Yan,et al.  Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[4]  Leibo Liu,et al.  A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications , 2018, IEEE Journal of Solid-State Circuits.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  S. Winograd Arithmetic complexity of computations , 1980 .

[8]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[9]  Soheil Ghiasi,et al.  Design space exploration of FPGA-based Deep Convolutional Neural Networks , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Zhongfeng Wang,et al.  Efficient convolution architectures for convolutional neural network , 2016, 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP).

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  K.K. Parhi,et al.  Low-power 4-2 and 5-2 compressors , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[14]  Yen-Cheng Kuan,et al.  A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[15]  Keshab K. Parhi,et al.  Area-efficient parallel FIR digital filter implementations , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.