论文信息 - An FPGA-Based Reconfigurable CNN Accelerator for YOLO

An FPGA-Based Reconfigurable CNN Accelerator for YOLO

Convolutional neural network (CNN) has been widely used in image processing fields. Object detection models based on CNN, such as YOLO and SSD, have been proved to be the most advanced in many applications. CNN have extremely high requirements on computing power and memory bandwidth, which usually needs to be deployed to a dedicated hardware platform. FPGA has great advantages in reconfigurability and performance power ratio, which is a suitable choice to deploy CNN. In this paper, we propose a reconfigurable CNN accelerator with AXI bus based on ARM + FPGA architecture. The accelerator can receive the configuration signals sent by ARM and complete the calculation during inference of different CNN layers through time-sharing. By combining convolution and pooling operation, the number of data moves of convolutional layer and pooling layer is reduced to reduce the number of off-chip memory accesses. The floating-point number is converted into 16-bit dynamic fixed-point format, which improves the calculation performance. We implemented the proposed architecture on the Xilinx ZCU102 FPGA for YOLOv2 and YOLOv2 Tiny models on COCO and VOC 2007 respectively, with peak performance of 289GOPs at 300MHz clock frequency.

[1] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[2] Qi Zhang,et al. FPGA Implementation of Quantized Convolutional Neural Networks , 2019, 2019 IEEE 19th International Conference on Communication Technology (ICCT).

[3] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[4] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Guoqiang Bai,et al. A FPGA-based Accelerator of Convolutional Neural Network for Face Feature Extraction , 2019, 2019 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC).

[6] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[7] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[8] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[9] Lin Xu,et al. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[10] Hyuk-Jae Lee,et al. A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11] Yu Cao,et al. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[12] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[14] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[15] Yu Cao,et al. Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[19] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[20] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Kurt Keutzer,et al. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.