Soft Sampling for Robust Object Detection

We study the robustness of object detection under the presence of missing annotations. In this setting, the unlabeled object instances will be treated as background, which will generate an incorrect training signal for the detector. Interestingly, we observe that after dropping 30% of the annotations (and labeling them as background), the performance of CNN-based object detectors like Faster-RCNN only drops by 5% on the PASCAL VOC dataset. We provide a detailed explanation for this result. To further bridge the performance gap, we propose a simple yet effective solution, called Soft Sampling. Soft Sampling re-weights the gradients of RoIs as a function of overlap with positive instances. This ensures that the uncertain background regions are given a smaller weight compared to the hardnegatives. Extensive experiments on curated PASCAL VOC datasets demonstrate the effectiveness of the proposed Soft Sampling method at different annotation drop rates. Finally, we show that on OpenImagesV3, which is a real-world dataset with missing annotations, Soft Sampling outperforms standard detection baselines by over 3%.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Larry S. Davis,et al.  R-FCN-3000 at 30fps: Decoupling Detection and Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[7]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Yuan Gao,et al.  Solution for Large-Scale Hierarchical Object Detection Datasets with Incomplete Annotation and Data Imbalance , 2018, ArXiv.

[9]  Larry S. Davis,et al.  An Analysis of Scale Invariance in Object Detection - SNIP , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[11]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Yuning Jiang,et al.  MegDet: A Large Mini-Batch Object Detector , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[16]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[20]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23]  Yuxing Tang,et al.  Large Scale Semi-Supervised Object Detection Using Visual and Semantic Knowledge Transfer , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.