A High-Performance CNN Processor Based on FPGA for MobileNets

Convolution neural networks (CNNs) have been widely applied in the fields of computer vision tasks. However, it is hard to deploy those standard neural networks into embedded devices because of their large amount of operations and parameters. MobileNet, the state-of-the-art CNN which adopts depthwise separable convolution to replace the standard convolution has significantly reduced operations and parameters with only limited loss in accuracy. A high-performance CNN processor based on FPGA is proposed in this paper. To improve the efficiency, two dedicated computing engines named Conv Engine and Dwcv Engine were designed for pointwise convolution and depthwise convolution respectively. The schedule for Conv Engine and Dwcv Engine has significantly improved the efficiency of our accelerator. Furthermore, we designed a special architecture called Channel Augmentation to improve the efficiency in the first layer of MobileNets. The accelerator can be flexibly deployed to various devices with different configurations to balance hardware resources and computational performance. We implemented our accelerator on ZU2 and ZU9 MPSoC FPGAs. The classification on ImageNet achieved 205.3 frames per second(fps) on ZU2 and 809.8 fps on ZU9, which is 15.4x speedup on ZU2 and 60.7x speedup on ZU9 compared to CPU. We also deployed MobileNet + SSD network on our accelerator for object detection, and achieved 31.0 fps on ZU2 and 124.3 fps on ZU9.

[1]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[3]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[4]  Luciano Lavagno,et al.  Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs , 2018, FPGA.

[5]  David B. Thomas,et al.  Redundancy-Reduced MobileNet Acceleration on Reconfigurable Logic for ImageNet Classification , 2018, ARC.

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Kurt Keutzer,et al.  Invited: Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Wayne Luk,et al.  Automatic Optimising CNN with Depthwise Separable Convolution on FPGA: (Abstact Only) , 2018, FPGA.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[16]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Michael Ferdman,et al.  Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[18]  Jun Yang,et al.  DrAcc: a DRAM based Accelerator for Accurate CNN Inference , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[19]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Kurt Keutzer,et al.  Co-design of deep neural nets and neural net accelerators for embedded vision applications , 2019, IBM J. Res. Dev..

[22]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[23]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.