Learning From Noisy Anchors for One-Stage Object Detection

State-of-the-art object detectors rely on regressing and classifying an extensive list of possible anchors, which are divided into positive and negative samples based on their intersection-over-union (IoU) with corresponding ground-truth objects. Such a harsh split conditioned on IoU results in binary labels that are potentially noisy and challenging for training. In this paper, we propose to mitigate noise incurred by imperfect label assignment such that the contributions of anchors are dynamically determined by a carefully constructed cleanliness score associated with each anchor. Exploring outputs from both regression and classification branches, the cleanliness scores, estimated without incurring any additional computational overhead, are used not only as soft labels to supervise the training of the classification branch but also sample re-weighting factors for improved localization and classification accuracy. We conduct extensive experiments on COCO, and demonstrate, among other things, the proposed approach steadily improves RetinaNet by ~2% with various backbones.

[1]  Huajun Feng,et al.  Libra R-CNN: Towards Balanced Learning for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[3]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Rama Chellappa,et al.  Deep Regionlets for Object Detection , 2017, ECCV.

[6]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Larry S. Davis,et al.  SNIPER: Efficient Multi-Scale Training , 2018, NeurIPS.

[8]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Nuno Vasconcelos,et al.  Cascade R-CNN: High Quality Object Detection and Instance Segmentation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yu Liu,et al.  Gradient Harmonized Single-stage Detector , 2018, AAAI.

[12]  Larry S. Davis,et al.  R-FCN-3000 at 30fps: Decoupling Detection and Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[14]  Kai Chen,et al.  Region Proposal by Guided Anchoring , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shiguang Shan,et al.  Cascade RetinaNet: Maintaining Consistency for Single-Stage Object Detection , 2019, BMVC.

[16]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[17]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[22]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[23]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Zhaoxiang Zhang,et al.  Scale-Aware Trident Networks for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jinjun Xiong,et al.  Revisiting RCNN: On Awakening the Classification Power of Faster RCNN , 2018, ECCV.

[28]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[29]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Stephen Lin,et al.  RepPoints: Point Set Representation for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Junjie Yan,et al.  Grid R-CNN , 2018, 1811.12030.

[32]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[34]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[37]  Kai Chen,et al.  Hybrid Task Cascade for Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[39]  Jeff A. Bilmes,et al.  Combating Label Noise in Deep Learning Using Abstention , 2019, ICML.

[40]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Mohan S. Kankanhalli,et al.  Learning to Learn From Noisy Labeled Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Kai Chen,et al.  Prime Sample Attention in Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Tieniu Tan,et al.  POD: Practical Object Detection With Scale-Sensitive Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[45]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.