Asymmetric Convolution Networks Based on Multi-feature Fusion for Object Detection

Real-time object detection on mobile platforms is a crucial but challenging computer vision task. However, it is widely recognized that although the lightweight object detectors have a high detection speed, the detection accuracy is relatively low. To address this problem, in this paper, we propose a new lightweight one-stage generic object detector, named ACFNet. The design goal of ACFNet is to improve the detection accuracy of object detectors while maintaining a high detection speed. In the backbone part, we design an asymmetric convolution unit, namely AC block, which employs three asymmetric convolution branches to enhance multi-feature representation of CNNs. We follow the backbone part design in ShuffleNetV2, and present a backbone network called ACNet that exploits AC blocks to replace the standard convolutional layers, e.g., $3 \times 3$ layers. Besides, in order to make all feature maps stronger semantically, we design a deconvolution block. In the prediction block, spatial attention can make the network better locate the feature distribution. Following this principle, we add a spatial attention building block to the residual block of each prediction layer. Experiments on PASCAL VOC 2007 and VOC 2012 datasets demonstrate the effectiveness of our proposed method and can run at 19 FPS on a single GPU.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[3]  Eugenio Culurciello,et al.  Flattened Convolutional Neural Networks for Feedforward Acceleration , 2014, ICLR.

[4]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[5]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[7]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[8]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[9]  Zhiguo Cao,et al.  Toward Good Practices for Fine-Grained Maize Cultivar Identification With Filter-Specific Convolutional Activations , 2018, IEEE Transactions on Automation Science and Engineering.

[10]  Hsueh-Ming Hang,et al.  Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation , 2018, MMAsia.

[11]  Yifan Chen,et al.  Multiscale Feature-Clustering-Based Fully Convolutional Autoencoder for Fast Accurate Visual Inspection of Texture Surface Defects , 2019, IEEE Transactions on Automation Science and Engineering.

[12]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[13]  Dawei Li,et al.  Automatic Detection and Classification of Sewer Defects via Hierarchical Deep Learning , 2019, IEEE Transactions on Automation Science and Engineering.

[14]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[15]  Ma,et al.  FSRFNet: Feature-Selective and Spatial Receptive Fields Networks , 2019, Applied Sciences.

[16]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).