论文信息 - PreDet: Large-scale weakly supervised pre-training for detection

PreDet: Large-scale weakly supervised pre-training for detection

State-of-the-art object detection approaches typically rely on pre-trained classification models to achieve better performance and faster convergence. We hypothesize that classification pre-training strives to achieve translation invariance, and consequently ignores the localization aspect of the problem. We propose a new large-scale pre-training strategy for detection, where noisy class labels are available for all images, but not bounding-boxes. In this setting, we augment standard classification pre-training with a new detection-specific pretext task. Motivated by the noise-contrastive learning based self-supervised approaches, we design a task that forces bounding boxes with high-overlap to have similar representations in different views of an image, compared to non-overlapping boxes. We redesign Faster R-CNN modules to perform this task efficiently. Our experimental results show significant improvements over existing weakly-supervised and self-supervised pre-training approaches in both detection accuracy as well as fine-tuning speed.

D. Mahajan | Vignesh Ramanathan | Rui Wang

[1] Armand Joulin,et al. Self-supervised Pretraining of Visual Features in the Wild , 2021, ArXiv.

[2] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[3] Jianfeng Gao,et al. Self-supervised Pre-training with Hard Examples Improves Visual Representations , 2020, ArXiv.

[4] Di Huang,et al. Improving Object Detection with Selective Self-supervised Self-training , 2020, ECCV.

[5] Julien Mairal,et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[6] Pierre H. Richemond,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[7] Quoc V. Le,et al. Rethinking Pre-training and Self-training , 2020, NeurIPS.

[8] Chen Sun,et al. What makes for good views for contrastive learning , 2020, NeurIPS.

[9] Han Zhang,et al. A Simple Semi-Supervised Learning Framework for Object Detection , 2020, ArXiv.

[10] Wanli Ouyang,et al. Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection , 2020, ECCV.

[11] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Kaiming He,et al. Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[13] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[14] S. Gelly,et al. Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.

[15] Laurens van der Maaten,et al. Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Quoc V. Le,et al. EfficientDet: Scalable and Efficient Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Ross B. Girshick,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Yuki M. Asano,et al. Self-labelling via simultaneous clustering and representation learning , 2019, ICLR.

[19] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[22] Yongjian Wu,et al. UWSOD: Toward Fully-Supervised-Level Capacity Weakly Supervised Object Detection , 2020, NeurIPS.

[23] Jian Sun,et al. Objects365: A Large-Scale, High-Quality Dataset for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24] Zhe L. Lin,et al. Scaling Object Detection by Transferring Classification Weights , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] Taiji Suzuki,et al. Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[26] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[27] Ross B. Girshick,et al. LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Alexander Kolesnikov,et al. S4L: Self-Supervised Semi-Supervised Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29] Abhinav Gupta,et al. Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30] Kan Chen,et al. Billion-scale semi-supervised learning for image classification , 2019, ArXiv.

[31] Xingyi Zhou,et al. Objects as Points , 2019, ArXiv.

[32] Larry S. Davis,et al. An Analysis of Pre-Training on Object Detection , 2019, ArXiv.

[33] Hao Chen,et al. FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34] Marios Savvides,et al. Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Junjie Yan,et al. Grid R-CNN , 2018, 1811.12030.

[36] Kaiming He,et al. Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37] Tao Mei,et al. ScratchDet: Training Single-Shot Object Detectors From Scratch , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Nojun Kwak,et al. Consistency-based Semi-supervised Learning for Object detection , 2019, NeurIPS.

[39] Quoc V. Le,et al. DropBlock: A regularization method for convolutional networks , 2018, NeurIPS.

[40] Weilin Huang,et al. CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images , 2018, ECCV.

[41] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[42] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[43] Ian D. Reid,et al. Bootstrapping the Performance of Webly Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[47] Ashok Veeraraghavan,et al. Learning from Noisy Web Data with Category-Level Supervision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Kaiming He,et al. Data Distillation: Towards Omni-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49] Jianfei Cai,et al. Zero-Annotation Object Detection with Web Knowledge Transfer , 2017, ECCV.

[50] Xiaogang Wang,et al. Chained Cascade Network for Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51] Zhiqiang Shen,et al. DSOD: Learning Deeply Supervised Object Detectors from Scratch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53] Armand Joulin,et al. Unsupervised Learning by Predicting Noise , 2017, ICML.

[54] Seunghoon Hong,et al. Weakly Supervised Semantic Segmentation Using Web-Crawled Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Yao Li,et al. Attend in Groups: A Weakly-Supervised Deep Learning Framework for Learning from Web Data , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Yunchao Wei,et al. STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Thomas Brox,et al. Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60] Xinlei Chen,et al. Webly Supervised Learning of Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61] Andrea Vedaldi,et al. Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[62] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[63] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[64] Ali Farhadi,et al. Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[65] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[66] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[67] Lorenzo Torresani,et al. Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach , 2010, NIPS.

[68] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[69] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.