SoC implementation of depthwise separable convolution for object detection

Depthwise separable convolution has been used in CNN to reduce operations and parameters with only limited loss in accuracy. However, the state-of-the-art CNNs that adopt depthwise separable convolution still require a powerful computing platform such as a Graphics Processing Unit (GPU). In this paper: firstly, we design three kinds of work models for process element(PE) based on FPGA: standard convolution model, depthwise convolution model and pointwise convolution model; then we implement MobileNet-SSD on SoC for object detection, and our method requires only limited resources; besides, we propose a method to transform float-point data into fixed-point data, making computation more convenient. And the results show that our work can detect object at 13 fps.

[1]  Yingjie Zhang,et al.  The Implementation of CNN-Based Object Detector on ARM Embedded Platforms , 2018, 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech).

[2]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[3]  Shengen Yan,et al.  Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[5]  Wayne Luk,et al.  Towards Efficient Convolutional Neural Network for Domain-Specific Applications on FPGA , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[6]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  David B. Thomas,et al.  Redundancy-Reduced MobileNet Acceleration on Reconfigurable Logic for ImageNet Classification , 2018, ARC.

[8]  Shengen Yan,et al.  Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[9]  Wayne Luk,et al.  Automatic Optimising CNN with Depthwise Separable Convolution on FPGA: (Abstact Only) , 2018, FPGA.

[10]  Roman A. Solovyev,et al.  FPGA Implementation of Convolutional Neural Networks with Fixed-Point Calculations , 2018, ArXiv.

[11]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[12]  Dmitry Telpukhov,et al.  Fixed-Point Convolutional Neural Network for Real-Time Video Processing in FPGA , 2019, 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus).