Multi-scale global context feature pyramid network for object detector

In order to capture more contextual information, various attention mechanisms are applied to object detectors. However, the spatial interaction in the commonly used attention mechanisms is single scale, and it cannot capture the context information of the objects from the feature maps of different scales, which will lead to the underutilization of the context information. In addition, since the predicted bounding box does not completely fit the shape and pose of the object, it has room for further improvement in the performance. In this paper, we propose a multi-scale global context feature pyramid network to obtain a feature pyramid with richer context information, which is a two-layer lightweight neck structure. Moreover, we extend the regression branch by adding an additional prediction head to predict the corner offsets of the bounding boxes to further refine the bounding boxes, which can effectively improve the accuracy of the predicted bounding boxes. Extensive experiments are conducted on the MS COCO 2017 detection datasets. Without bells and whistles, the proposed methods show an average 2% improvement over the RetinaNet baseline.

[1]  Zhaohui Zheng,et al.  Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression , 2019, AAAI.

[2]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Zhaoxiang Zhang,et al.  Scale-Aware Trident Networks for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[8]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[9]  Quoc V. Le,et al.  EfficientDet: Scalable and Efficient Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[11]  Emre Akbas,et al.  Reducing Label Noise in Anchor-Free Object Detection , 2020, BMVC.

[12]  Quoc V. Le,et al.  NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Larry S. Davis,et al.  AutoFocus: Efficient Multi-Scale Inference , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Rongrong Ji,et al.  FreeAnchor: Learning to Match Anchors for Visual Object Detection , 2019, NeurIPS.

[15]  Larry S. Davis,et al.  An Analysis of Scale Invariance in Object Detection - SNIP , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Marek Vajgl,et al.  Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3 , 2020, Neural Comput. Appl..

[19]  Junjie Yan,et al.  Grid R-CNN , 2018, 1811.12030.

[20]  Larry S. Davis,et al.  Learning From Noisy Anchors for One-Stage Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[22]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[24]  Stephen Lin,et al.  RepPoints: Point Set Representation for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[29]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[30]  Hee Seok Lee,et al.  Probabilistic Anchor Assignment with IoU Prediction for Object Detection , 2020, ECCV.

[31]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[33]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[34]  Huajun Feng,et al.  Libra R-CNN: Towards Balanced Learning for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ling-Yu Duan,et al.  Towards Accurate One-Stage Object Detection With AP-Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[38]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[43]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Nuno Vasconcelos,et al.  Cascade R-CNN: High Quality Object Detection and Instance Segmentation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.