SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining

Training with sparse annotations is known to reduce the performance of object detectors. Previous methods have focused on proxies for missing ground truth annotations in the form of pseudo-labels for unlabeled boxes. We observe that existing methods suffer at higher levels of sparsity in the data due to noisy pseudo-labels. To prevent this, we propose an end-to-end system that learns to separate the proposals into labeled and unlabeled regions using Pseudo-positive mining. While the labeled regions are processed as usual, self-supervised learning is used to process the unlabeled regions thereby preventing the negative effects of noisy pseudo-labels. This novel approach has multiple advantages such as improved robustness to higher sparsity when compared to existing methods. We conduct exhaustive experiments on five splits on the PASCAL-VOC and COCO datasets achieving state-of-the-art performance. We also unify various splits used across literature for this task and present a standardized benchmark. On average, we improve by $2.6$, $3.9$ and $9.6$ mAP over previous state-of-the-art methods on three splits of increasing sparsity on COCO. Our project is publicly available at https://www.cs.umd.edu/~sakshams/SparseDet.

[1]  Ke Yan,et al.  SIOD: Single Instance Annotated Per Category Per Image for Object Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Xiang Bai,et al.  End-to-End Semi-Supervised Object Detection with Soft Teacher , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Biao Wang,et al.  Interactive Self-Training with Mean Teachers for Semi-supervised Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Rama Chellappa,et al.  The Pursuit of Knowledge: Discovering and Localizing Novel Categories using Dual Memory , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Peter Vajda,et al.  Unbiased Teacher for Semi-Supervised Object Detection , 2021, ICLR.

[6]  Xiangyu Zhang,et al.  Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection , 2020, AAAI.

[7]  Xin Han,et al.  A Novel Loss Calibration Strategy for Object Detection Networks Training on Sparsely Annotated Pathological Datasets , 2020, MICCAI.

[8]  Adam P. Harrison,et al.  Learning From Multiple Datasets With Heterogeneous and Partial Labels for Universal Lesion Detection in CT , 2020, IEEE Transactions on Medical Imaging.

[9]  Jean Ponce,et al.  Toward unsupervised, multi-object discovery in large-scale image collections , 2020, ECCV.

[10]  Seungbum Hong,et al.  Semi-Supervised Object Detection With Sparsely Annotated Dataset , 2020, 2021 IEEE International Conference on Image Processing (ICIP).

[11]  Adam P. Harrison,et al.  Universal Lesion Detection by Learning from Multiple Heterogeneously Labeled Datasets , 2020, ArXiv.

[12]  Han Zhang,et al.  A Simple Semi-Supervised Learning Framework for Object Detection , 2020, ArXiv.

[13]  Bernard Ghanem,et al.  Beyond Weakly Supervised: Pseudo Ground Truths Mining for Missing Bounding-Boxes Object Detection , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Junnan Li,et al.  Towards Noise-resistant Object Detection with Noisy Annotations , 2020, ArXiv.

[15]  Terrance E. Boult,et al.  The Overlooked Elephant of Object Detection: Open Set , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  Zhiqiang Shen,et al.  Solving Missing-Annotation Object Detection with Background Recalibration Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Lawrence Carin,et al.  Object Detection as a Positive-Unlabeled Problem , 2020, BMVC.

[18]  Caiming Xiong,et al.  Proposal Learning for Semi-Supervised Object Detection , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Jitendra Malik,et al.  Mesh R-CNN , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Simon Chadwick,et al.  Training Object Detectors With Noisy Data , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[21]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Patrick Pérez,et al.  Unsupervised Image Matching and Object Discovery as Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Kaiming He,et al.  Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Larry S. Davis,et al.  AutoFocus: Efficient Multi-Scale Inference , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Takuya Akiba,et al.  Sampling Techniques for Large-Scale Object Detection From Sparsely Annotated Objects , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jordi Pont-Tuset,et al.  The Open Images Dataset V4 , 2018, International Journal of Computer Vision.

[28]  Larry S. Davis,et al.  Soft Sampling for Robust Object Detection , 2018, BMVC.

[29]  Larry S. Davis,et al.  SNIPER: Efficient Multi-Scale Training , 2018, NeurIPS.

[30]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Larry S. Davis,et al.  An Analysis of Scale Invariance in Object Detection - SNIP , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[34]  Serge J. Belongie,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[38]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[41]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[42]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[43]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Nojun Kwak,et al.  Consistency-based Semi-supervised Learning for Object detection , 2019, NeurIPS.

[47]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[48]  L. Gool,et al.  The PASCAL visual object classes challenge 2006 (VOC2006) results , 2006 .