Training Robust Object Detectors From Noisy Category Labels and Imprecise Bounding Boxes

Object detection has gained great improvements with the advances of convolutional neural networks and the availability of large amounts of accurate training data. Though the amount of data is increasing significantly, the quality of data annotations is not guaranteed from the existing crowd-sourcing labeling platforms. In addition to noisy category labels, imprecise bounding box annotations are commonly existed for object detection data. When the quality of training data degenerates, the performance of the typical object detectors is severely impaired. In this paper, we propose a Meta-Refine-Net (MRNet) to train object detectors from noisy category labels and imprecise bounding boxes. First, MRNet learns to adaptively assign lower weights to proposals with incorrect labels so as to suppress large loss values generated by these proposals on the classification branch. Second, MRNet learns to dynamically generate more accurate bounding box annotations to overcome the misleading of imprecisely annotated bounding boxes. Thus, the imprecise bounding boxes could impose positive impacts on the regression branch rather than simply be ignored. Third, we propose to refine the imprecise bounding box annotations by jointly learning from both the category and the localization information. By doing this, the approximation of ground-truth bounding boxes is more accurate while the misleading would be further alleviated. Our MRNet is model-agnostic and is capable of learning from noisy object detection data with only a few clean examples (less than 2%). Extensive experiments on PASCAL VOC 2012 and MS COCO 2017 demonstrate the effectiveness and efficiency of our method.

[1]  Ming-Ming Cheng,et al.  Dynamic Feature Integration for Simultaneous Detection of Salient Object, Edge, and Skeleton , 2020, IEEE Transactions on Image Processing.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[4]  Sanja Fidler,et al.  Devil Is in the Edges: Learning Semantic Boundaries From Noisy Annotations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Qixiang Ye,et al.  Min-Entropy Latent Model for Weakly Supervised Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Linchao Zhu,et al.  Faster Meta Update Strategy for Noise-Robust Deep Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Yu-Wing Tai,et al.  Few-Shot Object Detection With Attention-RPN and Multi-Relation Detector , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Lijun Wu,et al.  Learning to Teach with Dynamic Loss Functions , 2018, NeurIPS.

[12]  Junnan Li,et al.  Towards Noise-resistant Object Detection with Noisy Annotations , 2020, ArXiv.

[13]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[14]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Sharath Pankanti,et al.  RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[18]  Qiang Zhang,et al.  Part-Object Relational Visual Saliency , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[20]  Emre Akbas,et al.  Reducing Label Noise in Anchor-Free Object Detection , 2020, BMVC.

[21]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[22]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[24]  Wenyu Liu,et al.  Weakly Supervised Region Proposal Network and Object Detection , 2018, ECCV.

[25]  Deyu Meng,et al.  Few-Example Object Detection with Model Communication , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Rongrong Ji,et al.  Category-Aware Spatial Constraint for Weakly Supervised Detection , 2020, IEEE Transactions on Image Processing.

[27]  Xin Wang,et al.  Few-Shot Object Detection via Feature Reweighting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[29]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[30]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[31]  J. Stenton,et al.  Learning how to teach. , 1973, Nursing mirror and midwives journal.

[32]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[33]  Jungong Han,et al.  RGB-T Salient Object Detection via Fusing Multi-Level CNN Features , 2019, IEEE Transactions on Image Processing.

[34]  Xiangyang Li,et al.  Class Agnostic Image Common Object Detection , 2019, IEEE Transactions on Image Processing.

[35]  Shiguang Shan,et al.  Weakly Supervised Object Detection With Segmentation Collaboration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Weiming Dong,et al.  Self-Supervised Feature Augmentation for Large Image Object Detection , 2020, IEEE Transactions on Image Processing.

[37]  Dong Xu,et al.  Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection , 2019, IEEE Transactions on Image Processing.

[38]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Hejun Wu,et al.  Cross-Modal Attentional Context Learning for RGB-D Object Detection , 2018, IEEE Transactions on Image Processing.

[40]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[42]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[43]  Hao Chen,et al.  LSTD: A Low-Shot Transfer Detector for Object Detection , 2018, AAAI.

[44]  Yuning Jiang,et al.  Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[45]  Xuelong Li,et al.  Hierarchical Shot Detector , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Qionghai Dai,et al.  DECODE: Deep Confidence Network for Robust Image Classification , 2019, IEEE Transactions on Image Processing.

[48]  Jiashi Feng,et al.  Few-Shot Adaptive Faster R-CNN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Gong Cheng,et al.  High-Quality Proposals for Weakly Supervised Object Detection , 2020, IEEE Transactions on Image Processing.

[50]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[51]  Larry S. Davis,et al.  Learning From Noisy Anchors for One-Stage Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.