Single-Shot Two-Pronged Detector with Rectified IoU Loss

In the CNN based object detectors, feature pyramids are widely exploited to alleviate the problem of scale variation across object instances. These object detectors, which strengthen features via a top-down pathway and lateral connections, are mainly to enrich the semantic information of low-level features, but ignore the enhancement of high-level features. This can lead to an imbalance between different levels of features, in particular a serious lack of detailed information in the high-level features, which makes it difficult to get accurate bounding boxes. In this paper, we introduce a novel two-pronged transductive idea to explore the relationship among different layers in both backward and forward directions, which can enrich the semantic information of low-level features and detailed information of high-level features at the same time. Under the guidance of the two-pronged idea, we propose a Two-Pronged Network (TPNet) to achieve bidirectional transfer between high-level features and low-level features, which is useful for accurately detecting object at different scales. Furthermore, due to the distribution imbalance between the hard and easy samples in single-stage detectors, the gradient of localization loss is always dominated by the hard examples that have poor localization accuracy. This will enable the model to be biased toward the hard samples. So in our TPNet, an adaptive IoU based localization loss, named Rectified IoU (RIoU) loss, is proposed to rectify the gradients of each kind of samples. The Rectified IoU loss increases the gradients of examples with high IoU while suppressing the gradients of examples with low IoU, which can improve the overall localization accuracy of model. Extensive experiments demonstrate the superiority of our TPNet and RIoU loss.

[1]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[2]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[3]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Qi Qian,et al.  Learning to Rank Proposals for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[9]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Yunhong Wang,et al.  Receptive Field Block Net for Accurate and Fast Object Detection , 2017, ECCV.

[11]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[12]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Ying Chen,et al.  M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network , 2018, AAAI.

[14]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[15]  Zhaohui Zheng,et al.  Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression , 2019, AAAI.

[16]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[17]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[19]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[23]  Hanqing Lu,et al.  CoupleNet: Coupling Global Structure with Local Parts for Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Huajun Feng,et al.  Libra R-CNN: Towards Balanced Learning for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[26]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[27]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Lingxiao Yang,et al.  Dynamic Anchor Feature Selection for Single-Shot Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Yu Liu,et al.  Gradient Harmonized Single-stage Detector , 2018, AAAI.

[30]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31]  Fahad Shahbaz Khan,et al.  Enriched Feature Guided Refinement Network for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Zhaoxiang Zhang,et al.  Scale-Aware Trident Networks for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).