Dual Refinement Feature Pyramid Networks for Object Detection

FPN is a common component used in object detectors, it supplements multi-scale information by adjacent level features interpolation and summation. However, due to the existence of nonlinear operations and the convolutional layers with different output dimensions, the relationship between different levels is much more complex, the pixel-wise summation is not an efficient approach. In this paper, we first analyze the design defects from pixel level and feature map level. Then, we design a novel parameter-free feature pyramid networks named Dual Refinement Feature Pyramid Networks (DRFPN) for the problems. Specifically, DRFPN consists of two modules: Spatial Refinement Block (SRB) and Channel Refinement Block (CRB). SRB learns the location and content of sampling points based on contextual information between adjacent levels. CRB learns an adaptive channel merging method based on attention mechanism. Our proposed DRFPN can be easily plugged into existing FPN-based models. Without bells and whistles, for two-stage detectors, our model outperforms different FPN-based counterparts by 1.6 to 2.2 AP on the COCO detection benchmark, and 1.5 to 1.9 AP on the COCO segmentation benchmark. For one-stage detectors, DRFPN improves anchor-based RetinaNet by 1.9 AP and anchor-free FCOS by 1.3 AP when using ResNet50 as backbone. Extensive experiments verifies the robustness and generalization ability of DRFPN. The code will be made publicly available.

[1]  Kai Chen,et al.  Feature Pyramid Grids , 2020, ArXiv.

[2]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Limin Wang,et al.  LIP: Local Importance-Based Pooling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Kuiyuan Yang,et al.  Semantic Flow for Fast and Accurate Scene Parsing , 2020, ECCV.

[11]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[15]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Kai Chen,et al.  CARAFE: Content-Aware ReAssembly of FEatures , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[22]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jun Guo,et al.  Attention-guided Context Feature Pyramid Network for Object Detection , 2020, ArXiv.

[25]  Quoc V. Le,et al.  EfficientDet: Scalable and Efficient Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Huajun Feng,et al.  Libra R-CNN: Towards Balanced Learning for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yong Jae Lee,et al.  YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[29]  Xinjiang Wang,et al.  Scale-Equalizing Pyramid Convolution for Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[32]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[33]  Shiming Xiang,et al.  AugFPN: Improving Multi-Scale Feature Learning for Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yuxing Peng,et al.  ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[36]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[38]  Jianping Shi,et al.  Improving Semantic Segmentation via Decoupled Body and Edge Supervision , 2020, ECCV.

[39]  Jinhui Tang,et al.  Feature Pyramid Transformer , 2020, ECCV.

[40]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[41]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Hang Xu,et al.  Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Xizhou Zhu,et al.  Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation , 2020, ICLR.

[44]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Pengfei Xiong,et al.  Pyramid Attention Network for Semantic Segmentation , 2018, BMVC.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[48]  Quoc V. Le,et al.  NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).