Noisy Localization Annotation Refinement For Object Detection

The production of finely annotated datasets for object detection tasks is labor-intensive, therefore, cloud sourcing is often used to create datasets, which leads to these datasets tending to contain incorrect annotations such as inaccurate localization bounding boxes. In this study, we highlight a problem of object detection with noisy bounding box annotations and show that these noisy annotations are harmful to the performance of deep neural networks. To solve this problem, we further propose a framework to allow the network to modify the noisy datasets by alternating refinement. The experimental results demonstrate that our proposed framework can significantly alleviate the influences of noise on model performance.

[1]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[3]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Frank Keller,et al.  Extreme Clicking for Efficient Object Annotation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Xiangyu Zhang,et al.  Bounding Box Regression With Uncertainty for Accurate Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[13]  Simon Chadwick,et al.  Training Object Detectors With Noisy Data , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[14]  Arash Vahdat,et al.  Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks , 2017, NIPS.

[15]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[16]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[18]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[19]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.