论文信息 - StarNet: towards Weakly Supervised Few-Shot Object Detection

StarNet: towards Weakly Supervised Few-Shot Object Detection

Few-shot detection and classification have advanced significantly in recent years. Yet, detection approaches require strong annotation (bounding boxes) both for pre-training and for adaptation to novel classes, and classification approaches rarely provide localization of objects in the scene. In this paper, we introduce StarNet - a few-shot model featuring an end-to-end differentiable non-parametric star-model detection and classification head. Through this head, the backbone is meta-trained using only image-level labels to produce good features for jointly localizing and classifying previously unseen categories of few-shot test tasks using a star-model that geometrically matches between the query and support images (to find corresponding object instances). Being a few-shot detector, StarNet does not require any bounding box annotations, neither during pre-training nor for novel classes adaptation. It can thus be applied to the previously unexplored and challenging task of Weakly Supervised Few-Shot Object Detection (WS-FSOD), where it attains significant improvements over the baselines. In addition, StarNet shows significant gains on few-shot classification benchmarks that are less cropped around the objects (where object localization is key).

[1] Yonghong Tian,et al. Transductive Episodic-Wise Adaptive Metric for Few-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2] Deva Ramanan,et al. Meta-Learning to Detect Rare Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3] Xilin Chen,et al. Cross Attention Network for Few-shot Classification , 2019, NeurIPS.

[4] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[5] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Jitendra Malik,et al. Object detection using a max-margin Hough transform , 2009, CVPR.

[7] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8] Yi Yang,et al. Self-produced Guidance for Weakly-supervised Object Localization , 2018, ECCV.

[9] Leonidas J. Guibas,et al. Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10] Trevor Darrell,et al. Frustratingly Simple Few-Shot Object Detection , 2020, ICML.

[11] Louis B. Rall,et al. Automatic differentiation , 1981 .

[12] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[14] Rogério Schmidt Feris,et al. Delta-encoder: an effective sample synthesis method for few-shot object recognition , 2018, NeurIPS.

[15] Subhransu Maji,et al. Bilinear CNNs for Fine-grained Visual Recognition , 2015 .

[16] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[17] Bernt Schiele,et al. Learning to Self-Train for Semi-Supervised Few-Shot Classification , 2019, NeurIPS.

[18] Taesup Kim,et al. Edge-Labeling Graph Neural Network for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Li-Jia Li,et al. Generative Modeling for Small-Data Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Yi Yang,et al. Contrastive Adaptation Network for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Bernt Schiele,et al. An Implicit Shape Model for Combined Object Categorization and Segmentation , 2006, Toward Category-Level Object Recognition.

[24] Byron Boots,et al. Learning to Find Common Objects Across Few Image Collections , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[26] Andrea Vedaldi,et al. Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Luca Bertinetto,et al. Meta-learning with differentiable closed-form solvers , 2018, ICLR.

[28] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[29] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[30] Hongyang Chao,et al. WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31] Shimon Ullman,et al. Combining Class-Specific Fragments for Object Classification , 1999, BMVC.

[32] Rogério Schmidt Feris,et al. LaSO: Label-Set Operations Networks for Multi-Label Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Nikos Komodakis,et al. Generating Classification Weights With GNN Denoising Autoencoders for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Dacheng Tao,et al. Collect and Select: Semantic Alignment Metric Learning for Few-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[36] Xiaogang Wang,et al. Finding Task-Relevant Features for Few-Shot Learning by Category Traversal , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Sharath Pankanti,et al. RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Asaf Tzadok,et al. Fine-Grained Recognition of Thousands of Object Categories with Single-Example Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Hao Chen,et al. LSTD: A Low-Shot Transfer Detector for Object Detection , 2018, AAAI.

[40] Jitendra Malik,et al. Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[42] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Razvan Pascanu,et al. Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[44] Yannis Avrithis,et al. Dense Classification and Implanting for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[46] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[47] Bin Wu,et al. Deep Meta-Learning: Learning to Learn in the Concept Space , 2018, ArXiv.

[48] Pedro H. O. Pinheiro,et al. Adaptive Cross-Modal Few-Shot Learning , 2019, NeurIPS.

[49] Bingbing Ni,et al. Variational Few-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50] Wenyu Liu,et al. PCL: Proposal Cluster Learning for Weakly Supervised Object Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51] Subhransu Maji,et al. Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Pieter Abbeel,et al. Meta-Learning with Temporal Convolutions , 2017, ArXiv.

[53] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[54] Alexandre Lacoste,et al. TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[55] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[56] Cordelia Schmid,et al. Diversity With Cooperation: Ensemble Methods for Few-Shot Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57] Seong Joon Oh,et al. Evaluating Weakly Supervised Object Localization Methods Right , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Hong Yu,et al. Meta Networks , 2017, ICML.

[59] Jing Zhang,et al. Few-Shot Learning via Saliency-Guided Hallucination of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61] Wenyu Liu,et al. Weakly Supervised Region Proposal Network and Object Detection , 2018, ECCV.

[62] Guosheng Lin,et al. DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover’s Distance and Structured Classifiers , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Patrick Pérez,et al. Boosting Few-Shot Visual Learning With Self-Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64] Xin Wang,et al. Few-Shot Object Detection via Feature Reweighting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[65] Yu-Chiang Frank Wang,et al. A Closer Look at Few-shot Classification , 2019, ICLR.

[66] Cees Snoek,et al. SILCO: Show a Few Images, Localize the Common Object , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).