Knowledge-Based Fine-Grained Classification For Few-Shot Learning

The small inter-class variance and the large intra-class variance make the few-shot and fine-grained image classification more difficult because the machine cannot obtain enough information from only a few images. The external knowledge contains more semantics and can support the model to extract important features, while most of existing few-shot learning algorithms only focus on leveraging the visual features from images, little attention has been paid to the cross-modal external knowledge. In this paper, we propose a knowledge-based fine-grained classification mechanism for few-shot learning, which can overcome the difficulty of only obtaining limited and discriminative features from unimodal samples. We extract the visual features and the knowledge features from textual descriptions and a domain-specific knowledge graph at global and local levels to build the semantic space. To tackle the gap between multimodal features, we propose a mirror framework, named Mirror Mapping Network (MMN), to map the multimodal features into the same semantic space with two directions. Extensive experimental results show that our method outperforms the state-of-the-art.

[1]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[2]  Ruslan Salakhutdinov,et al.  Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[3]  Jinhui Tang,et al.  Few-Shot Image Recognition With Knowledge Transfer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Yinda Zhang,et al.  Semantic Feature Augmentation in Few-shot Learning , 2018, ArXiv.

[5]  Bernt Schiele,et al.  Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[8]  Rogério Schmidt Feris,et al.  Delta-encoder: an effective sample synthesis method for few-shot object recognition , 2018, NeurIPS.

[9]  Joan Bruna,et al.  Few-Shot Learning with Graph Neural Networks , 2017, ICLR.

[10]  Tao Mei,et al.  Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[12]  Wei Wang,et al.  Instance-Aware Image and Sentence Matching with Selective Multimodal LSTM , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Nuno Vasconcelos,et al.  AGA: Attribute-Guided Augmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Qiang Wu,et al.  Compare More Nuanced: Pairwise Alignment Bilinear Network for Few-Shot Fine-Grained Learning , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[16]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[17]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[18]  Amos Storkey,et al.  Learning to Learn By Self-Critique , 2019, NeurIPS.

[19]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .