Towards Cross-Granularity Few-Shot Learning: Coarse-to-Fine Pseudo-Labeling with Visual-Semantic Meta-Embedding

Few-shot learning aims at rapidly adapting to novel categories with only a handful of samples at test time, which has been predominantly tackled with the idea of meta-learning. However, meta-learning approaches essentially learn across a variety of few-shot tasks and thus still require large-scale training data with fine-grained supervision to derive a generalized model, thereby involving prohibitive annotation cost. In this paper, we advance the few-shot classification paradigm towards a more challenging scenario, i.e, cross-granularity few-shot classification, where the model observes only coarse labels during training while is expected to perform fine-grained classification during testing. This task largely relieves the annotation cost since fine-grained labeling usually requires strong domain-specific expertise. To bridge the cross-granularity gap, we approximate the fine-grained data distribution by greedy clustering of each coarse-class into pseudo-fine-classes according to the similarity of image embeddings. We then propose a meta-embedder that jointly optimizes the visual- and semantic-discrimination, in both instance-wise and coarse class-wise, to obtain a good feature space for this coarse-to-fine pseudo-labeling process. Extensive experiments and ablation studies are conducted to demonstrate the effectiveness and robustness of our approach on three representative datasets.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[3]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Matthieu Guillaumin,et al.  From categories to subcategories: Large-scale image classification with partial class label refinement , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[6]  Nanning Zheng,et al.  Transductive Semi-Supervised Deep Learning Using Min-Max Features , 2018, ECCV.

[7]  Bernt Schiele,et al.  Learning to Self-Train for Semi-Supervised Few-Shot Classification , 2019, NeurIPS.

[8]  Yu Liu,et al.  CNN-RNN: a large-scale hierarchical image classification framework , 2018, Multimedia Tools and Applications.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[11]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Stefano Soatto,et al.  Few-Shot Learning With Embedded Class Models and Shot-Free Meta Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Mubarak Shah,et al.  Task Agnostic Meta-Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Stefano Soatto,et al.  A Baseline for Few-Shot Image Classification , 2019, ICLR.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Bernt Schiele,et al.  Meta-Transfer Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Suvrit Sra,et al.  Strength from Weakness: Fast Learning Using Weak Supervision , 2020, ICML.

[19]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[20]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[21]  Shih-Fu Chang,et al.  Unsupervised Embedding Learning via Invariant and Spreading Instance Feature , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Subhransu Maji,et al.  Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Samy Bengio,et al.  Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2020, ICLR.

[24]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[25]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[26]  Ladislau Bölöni,et al.  Unsupervised Meta-Learning for Few-Shot Image Classification , 2019, NeurIPS.

[27]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[28]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[29]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[30]  Sham M. Kakade,et al.  Few-Shot Learning via Learning the Representation, Provably , 2020, ICLR.

[31]  Lina Yao,et al.  Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph , 2019, IJCAI.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Yue Wang,et al.  Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? , 2020, ECCV.

[34]  Trevor Darrell,et al.  A New Meta-Baseline for Few-Shot Learning , 2020, ArXiv.

[35]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[37]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[38]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[40]  Yannis Avrithis,et al.  Label Propagation for Deep Semi-Supervised Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[42]  Yunchao Wei,et al.  Meta Parsing Networks: Towards Generalized Few-shot Scene Parsing with Adaptive Metric Learning , 2020, ACM Multimedia.

[43]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[44]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[45]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[46]  Mehrtash Harandi,et al.  Adaptive Subspaces for Few-Shot Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Nasser M. Nasrabadi,et al.  A Weakly Supervised Fine Label Classifier Enhanced by Coarse Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Stefanie Jegelka,et al.  Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[52]  Gustavo Carneiro,et al.  Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[54]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[55]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Yoshua Bengio,et al.  MetaGAN: An Adversarial Approach to Few-Shot Learning , 2018, NeurIPS.

[57]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[58]  Tao Xiang,et al.  Few-Shot Learning With Global Class Representations , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Sergey Levine,et al.  Unsupervised Learning via Meta-Learning , 2018, ICLR.

[62]  Yu-Gang Jiang,et al.  Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition , 2020, ACM Multimedia.

[63]  Fei-Yue Wang,et al.  Learning from the Past: Meta-Continual Learning with Knowledge Embedding for Jointly Sketch, Cartoon, and Caricature Face Recognition , 2020, ACM Multimedia.

[64]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[65]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[66]  Qingming Huang,et al.  Task-distribution-aware Meta-learning for Cold-start CTR Prediction , 2020, ACM Multimedia.

[67]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.