论文信息 - Few-shot Learning with Contextual Cueing for Object Recognition in Complex Scenes

Few-shot Learning with Contextual Cueing for Object Recognition in Complex Scenes

Few-shot Learning aims to recognize new concepts from a small number of training examples. Recent work mainly tackle this problem by improving visual features, feature transfer and meta-training algorithms. In this work, we propose to explore a complementary direction by using scene context semantics to learn and recognize new concepts more easily. Whereas a few visual examples cannot cover all intra-class variations, contextual cueing offers a complementary signal to classify instances with unseen features or ambiguous objects. More specifically, we propose a Class-conditioned Context Attention Module (CCAM) that learns to weight the most important context elements while learning a particular concept. We additionally propose a flexible gating mechanism to ground visual class representations in context semantics. We conduct extensive experiments on Visual Genome dataset, and we show that compared to a visual-only baseline, our model improves top-1 accuracy by 20.47% and 9.13% in 5-way 1-shot and 5-way 5-shot, respectively; and by 20.42% and 12.45% in 20-way 1-shot and 20-way 5-shot, respectively.

Brahim Chaib-draa | Mathieu Pag'e Fortin

[1] Minija Tamosiunaite,et al. Distributional semantics of objects in visual scenes in comparison to text , 2019, Artif. Intell..

[2] Sanja Fidler,et al. The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Alexandre Lacoste,et al. TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[4] Pedro H. O. Pinheiro,et al. Adaptive Cross-Modal Few-Shot Learning , 2019, NeurIPS.

[5] A. Torralba,et al. The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[6] Laure Soulier,et al. Context-Aware Zero-Shot Learning for Object Recognition , 2019, ICML.

[7] Xinlei Chen,et al. Spatial Memory for Context Reasoning in Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8] Fabio A. González,et al. Gated Multimodal Units for Information Fusion , 2017, ICLR.

[9] Lei Wang,et al. Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Jinhui Tang,et al. Few-Shot Image Recognition With Knowledge Transfer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[12] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Sergey Levine,et al. Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[14] In-So Kweon,et al. LinkNet: Relational Embedding for Scene Graph , 2018, NeurIPS.

[15] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[16] Bernt Schiele,et al. Meta-Transfer Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Bharath Hariharan,et al. Few-Shot Learning With Localization in Realistic Settings , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Abhinav Gupta,et al. Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] M. Bar. Visual objects in context , 2004, Nature Reviews Neuroscience.

[20] Yannis Avrithis,et al. Dense Classification and Implanting for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[22] Mubarak Shah,et al. Task Agnostic Meta-Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[24] Nina Bonderup Dohn,et al. On the Concept of Context , 2018, Education Sciences.

[25] Zellig S. Harris,et al. Distributional Structure , 1954 .

[26] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[27] Rainer Stiefelhagen,et al. Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Zhiwu Lu,et al. Large-Scale Few-Shot Learning: Knowledge Transfer With Class Hierarchy , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Zellig S. Harris,et al. Distributional Structure , 1954 .

[30] Jing Zhang,et al. Few-Shot Learning via Saliency-Guided Hallucination of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Laure Soulier,et al. Learning Multi-Modal Word Representation Grounded in Visual Context , 2017, AAAI.

[32] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[33] Xin Wang,et al. TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[35] Raja Giryes,et al. Baby steps towards few-shot learning with multiple semantics , 2019, Pattern Recognit. Lett..

[36] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[37] Shiguang Shan,et al. Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38] Martial Hebert,et al. Image Deformation Meta-Networks for One-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Angeliki Lazaridou,et al. Combining Language and Vision with a Multimodal Skip-gram Model , 2015, NAACL.