Active Learning Strategies for Weakly-supervised Object Detection

. Object detectors trained with weak annotations are affordable alternatives to fully-supervised counterparts. However, there is still a significant performance gap between them. We propose to narrow this gap by fine-tuning a base pre-trained weakly-supervised detector with a few fully-annotated samples automatically selected from the training set using “box-in-box” (BiB), a novel active learning strategy designed specifically to address the well-documented failure modes of weakly-supervised detectors. Experiments on the VOC07 and COCO bench-marks show that BiB outperforms other active learning techniques and significantly improves the base weakly-supervised detector’s performance with only a few fully-annotated images per class. BiB reaches 97% of the performance of fully-supervised Fast RCNN with only 10% of fully-annotated images on VOC07. On COCO, using on average 10 fully-annotated images per class, or equivalently 1% of the training set, BiB also reduces the performance gap (in AP) between the weakly-supervised detector and the fully-supervised Fast RCNN by over 70%, showing a good trade-off between performance and data efficiency. Our code is pub-licly available at https://github.com/huyvvo/BiB .

[1]  Ismail Elezi,et al.  Not All Labels Are Equal: Rationalizing The Labeling Costs for Training Object Detection , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jean Ponce,et al.  Localizing Objects with Self-Supervised Transformers and no Labels , 2021, BMVC.

[3]  Weijia Li,et al.  Influence Selection for Active Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Siyu Huang,et al.  Semi-Supervised Active Learning with Temporal Output Discrepancy , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Xiang Bai,et al.  End-to-End Semi-Supervised Object Detection with Soft Teacher , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Cordelia Schmid,et al.  Large-Scale Unsupervised Object Discovery , 2021, NeurIPS.

[7]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Xiangyu Zhang,et al.  Points as Queries: Weakly Semi-supervised Object Detection by Points , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Xiangyang Ji,et al.  Multiple Instance Active Learning for Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hyuk-Jae Lee,et al.  Active Learning for Deep Object Detection via Probabilistic Modeling , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Chi Zhang,et al.  FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Caiming Xiong,et al.  Proposal Learning for Semi-Supervised Object Detection , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Bernard Ghanem,et al.  BAOD: Budget-Aware Object Detection , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Trevor Darrell,et al.  Minimax Active Learning , 2020, ArXiv.

[15]  Steven McDonagh,et al.  Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection , 2020, ECCV.

[16]  Himanshu Arora,et al.  Contextual Diversity for Active Learning , 2020, ECCV.

[17]  Di Huang,et al.  Improving Object Detection with Selective Self-supervised Self-training , 2020, ECCV.

[18]  Jean Ponce,et al.  Toward unsupervised, multi-object discovery in large-scale image collections , 2020, ECCV.

[19]  Quoc V. Le,et al.  Rethinking Pre-training and Self-training , 2020, NeurIPS.

[20]  Han Zhang,et al.  A Simple Semi-Supervised Learning Framework for Object Detection , 2020, ArXiv.

[21]  Qingming Huang,et al.  State-Relabeling Adversarial Active Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Michele Fenzi,et al.  Scalable Active Learning for Object Detection , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[23]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[24]  Zhili Liu,et al.  EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with Cascade Refinement , 2020, AAAI.

[25]  Larry S. Davis,et al.  Consistency-Based Semi-Supervised Active Learning: Towards Minimizing Labeling Cost , 2019, ECCV.

[26]  Yu-Wing Tai,et al.  Few-Shot Object Detection With Attention-RPN and Multi-Relation Detector , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  John Langford,et al.  Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[28]  Wenyu Liu,et al.  PCL: Proposal Cluster Learning for Weakly Supervised Object Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[30]  Dongrui Fan,et al.  C-MIDN: Coupled Multiple Instance Detection Network With Segmentation Guidance for Weakly Supervised Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Hongyang Chao,et al.  WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Wei Guo,et al.  An Adaptive Supervision Framework for Active Learning in Object Detection , 2019, BMVC.

[33]  Bin Wang,et al.  Low Shot Box Correction for Weakly Supervised Object Detection , 2019, IJCAI.

[34]  Shai Shalev-Shwartz,et al.  Discriminative Active Learning , 2019, ArXiv.

[35]  In So Kweon,et al.  Learning Loss for Active Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Chang Liu,et al.  C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Patrick Pérez,et al.  Unsupervised Image Matching and Object Discovery as Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Fedor Zhdanov,et al.  Diverse mini-batch Active Learning , 2019, ArXiv.

[40]  Xin Wang,et al.  Few-Shot Object Detection via Feature Reweighting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  C. V. Jawahar,et al.  Dissimilarity Coefficient Based Weakly Supervised Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Joachim Denzler,et al.  Active Learning for Deep Object Detection , 2018, VISIGRAPP.

[43]  Sharath Pankanti,et al.  RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Nojun Kwak,et al.  Consistency-based Semi-supervised Learning for Object detection , 2019, NeurIPS.

[45]  Jose M. Alvarez,et al.  Large-Scale Visual Active Learning with Deep Probabilistic Ensembles , 2018, ArXiv.

[46]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Lei Zhang,et al.  Towards Human-Machine Cooperation: Self-Supervised Sample Mining for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Ming-Yu Liu,et al.  Localization-Aware Active Learning for Object Detection , 2018, ACCV.

[49]  Kaiming He,et al.  Data Distillation: Towards Omni-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[51]  Vinay P. Namboodiri,et al.  Deep active learning for object detection , 2018, BMVC.

[52]  Ran El-Yaniv,et al.  Deep Active Learning over the Long Tail , 2017, ArXiv.

[53]  Wei Liu,et al.  Deep Self-Taught Learning for Weakly Supervised Object Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Wenyu Liu,et al.  Multiple Instance Detection Network with Online Instance Classifier Refinement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[56]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Luc Van Gool,et al.  Weakly Supervised Cascaded Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Cordelia Schmid,et al.  Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[64]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[65]  Cordelia Schmid,et al.  Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[67]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[68]  Yong Jae Lee,et al.  Weakly-supervised Discovery of Visual Pattern Configurations , 2014, NIPS.

[69]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[70]  Zaïd Harchaoui,et al.  On learning to localize objects with minimal supervision , 2014, ICML.

[71]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[73]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[74]  Thomas Deselaers,et al.  Localizing Objects While Learning Their Appearance , 2010, ECCV.

[75]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[76]  Jiayu Tang,et al.  Non-negative matrix factorisation for object class discovery and image auto-annotation , 2008, CIVR '08.

[77]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[78]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[79]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[80]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[81]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..