Channel–Spatial fusion aware net for accurate and fast object Detection

A major challenge of object detection is that accurate detector is limited by speed due to enormous network, while the lightweight detector can reach real-time but its weak representation ability leads to the expense of accuracy. To overcome the issue, we propose a channel–spatial fusion awareness module (CSFA) to improve the accuracy by enhancing the feature representation of network at the negligible cost of complexity. Given a feature map, our method exploits two parts sequentially, channel awareness and spatial awareness, to reconstruct feature map without deepening the network. Because of the property of CSFA for easy integrating into any layer of CNN architectures, we assemble this module into ResNet-18 and DLA-34 in CenterNet to form a CSFA detector. Results consistently show that CSFA-Net runs in a fairly fast speed, and achieves state-of-the-art, i.e., mAP of 81.12% on VOC2007 and AP of 43.2% on COCO.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Xu Tang,et al.  DuBox: No-Prior Box Objection Detection via Residual Dual Scale Detectors , 2019, ArXiv.

[4]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[6]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[7]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[12]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Jieping Ye,et al.  Object Detection in 20 Years: A Survey , 2019, Proceedings of the IEEE.

[18]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[19]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[20]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[21]  Nassir Navab,et al.  Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks , 2018, MICCAI.