Towards Contextual Learning in Few-shot Object Classification

Few-shot Learning (FSL) aims to classify new concepts from a small number of examples. While there have been an increasing amount of work on few-shot object classification in the last few years, most current approaches are limited to images with only one centered object. On the opposite, humans are able to leverage prior knowledge to quickly learn new concepts, such as semantic relations with contextual elements. Inspired by the concept of contextual learning in educational sciences, we propose to make a step towards adopting this principle in FSL by studying the contribution that context can have in object classification in a low-data regime. To this end, we first propose an approach to perform FSL on images of complex scenes. We develop two plug-and-play modules that can be incorporated into existing FSL methods to enable them to leverage contextual learning. More specifically, these modules are trained to weight the most important context elements while learning a particular concept, and then use this knowledge to ground visual class representations in context semantics. Extensive experiments on Visual Genome and Open Images show the superiority of contextual learning over learning individual objects in isolation.

[1]  Oliver Lemon,et al.  Incremental online learning of objects for robots operating in real environments , 2017, 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[2]  Mubarak Shah,et al.  Task Agnostic Meta-Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ling Shao,et al.  NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  In-So Kweon,et al.  LinkNet: Relational Embedding for Scene Graph , 2018, NeurIPS.

[5]  Abhinav Gupta,et al.  Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Zhiwu Lu,et al.  Large-Scale Few-Shot Learning: Knowledge Transfer With Class Hierarchy , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[9]  J. Dunning The elephant in the room. , 2013, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[10]  Lei Wang,et al.  Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Rainer Stiefelhagen,et al.  Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jing Zhang,et al.  Few-Shot Learning via Saliency-Guided Hallucination of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Shiguang Shan,et al.  Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[16]  Xin Wang,et al.  TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Mohit Bansal,et al.  LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.

[18]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[19]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[20]  Fabio A. González,et al.  Gated Multimodal Units for Information Fusion , 2017, ICLR.

[21]  Gemma Boleda,et al.  Distributed Prediction of Relations for Entities: The Easy, The Difficult, and The Impossible , 2017, *SEM.

[22]  Martial Hebert,et al.  Image Deformation Meta-Networks for One-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[24]  Elaine B. Johnson Contextual Teaching and Learning: What It Is and Why It's Here to Stay , 2001 .

[25]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[26]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[28]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[29]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[30]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[31]  Laure Soulier,et al.  Context-Aware Zero-Shot Learning for Object Recognition , 2019, ICML.

[32]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[33]  Leonid Sigal,et al.  Improved Few-Shot Visual Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[35]  Pedro H. O. Pinheiro,et al.  Adaptive Cross-Modal Few-Shot Learning , 2019, NeurIPS.

[36]  Xinlei Chen,et al.  Spatial Memory for Context Reasoning in Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Tao Xiang,et al.  Incremental Few-Shot Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  In So Kweon,et al.  Propose-and-Attend Single Shot Detector , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[39]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Nina Bonderup Dohn,et al.  On the Concept of Context , 2018, Education Sciences.

[41]  Raja Giryes,et al.  Baby steps towards few-shot learning with multiple semantics , 2019, Pattern Recognit. Lett..

[42]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[43]  Minija Tamosiunaite,et al.  Distributional semantics of objects in visual scenes in comparison to text , 2019, Artif. Intell..