Label Hallucination for Few-Shot Classification

Few-shot classification requires adapting knowledge learned from a large annotated base dataset to recognize novel unseen classes, each represented by few labeled examples. In such a scenario, pretraining a network with high capacity on the large dataset and then finetuning it on the few examples causes severe overfitting. At the same time, training a simple linear classifier on top of “frozen” features learned from the large labeled dataset fails to adapt the model to the properties of the novel classes, effectively inducing underfitting. In this paper we propose an alternative approach to both of these two popular strategies. First, our method pseudo-labels the entire large dataset using the linear classifier trained on the novel classes. This effectively “hallucinates” the novel classes in the large dataset, despite the novel categories not being present in the base database (novel and base classes are disjoint). Then, it finetunes the entire model with a distillation loss on the pseudo-labeled base examples, in addition to the standard cross-entropy loss on the novel dataset. This step effectively trains the network to recognize contextual and appearance cues that are useful for the novel-category recognition but using the entire large-scale base dataset and thus overcoming the inherent data-scarcity problem of few-shot learning. Despite the simplicity of the approach, we show that that our method outperforms the state-of-the-art on four wellestablished few-shot classification benchmarks.

[1]  Razvan Pascanu,et al.  Meta-Learning with Warped Gradient Descent , 2020, ICLR.

[2]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[3]  Yanwei Fu,et al.  Instance Credibility Inference for Few-Shot Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[5]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[6]  Zhiwu Lu,et al.  Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning , 2021, UAI.

[7]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8]  Subhransu Maji,et al.  Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Fahad Shahbaz Khan,et al.  Self-supervised Knowledge Distillation for Few-shot Learning , 2020, BMVC.

[10]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[11]  Bernt Schiele,et al.  Learning to Self-Train for Semi-Supervised Few-Shot Classification , 2019, NeurIPS.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Bernt Schiele,et al.  Meta-Transfer Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Guosheng Lin,et al.  DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover’s Distance and Structured Classifiers , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jure Leskovec,et al.  Concept Learners for Few-Shot Learning , 2020, ICLR.

[16]  Eunho Yang,et al.  Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning , 2018, ICLR.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[19]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[20]  Pablo Piantanida,et al.  Information Maximization for Few-Shot Learning , 2020, NeurIPS.

[21]  Leonid Karlinsky,et al.  TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification , 2020, ECCV.

[22]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[23]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[24]  Colin Wei,et al.  Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data , 2020, ICLR.

[25]  Jiebo Luo,et al.  TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[27]  Subhransu Maji,et al.  Shot in the Dark: Few-Shot Learning with No Base-Class Labels , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[29]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[30]  Luca Bertinetto,et al.  Meta-learning with differentiable closed-form solvers , 2018, ICLR.

[31]  David Berthelot,et al.  FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[32]  Zhiwu Lu,et al.  MELR: Meta-Learning via Modeling Episode-Level Relationships for Few-Shot Learning , 2021, ICLR.

[33]  Yang Gao,et al.  Local descriptor-based multi-prototype network for few-shot Learning , 2021, Pattern Recognit..

[34]  Zhiwu Lu,et al.  IEPT: Instance-Level and Episode-Level Pretext Tasks for Few-Shot Learning , 2021, ICLR.

[35]  Stephen H. Bach,et al.  Pseudo Shots: Few-Shot Learning with Auxiliary Data , 2020, ArXiv.

[36]  Lei Wang,et al.  Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[38]  Fahad Shahbaz Khan,et al.  Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[40]  Mehrtash Harandi,et al.  Adaptive Subspaces for Few-Shot Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Iterative label cleaning for transductive and semi-supervised few-shot learning , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Sung Whan Yoon,et al.  TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning , 2019, ICML.

[43]  Zhiqiang Shen,et al.  Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot Learning , 2021, AAAI.

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Yue Wang,et al.  Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? , 2020, ECCV.

[46]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[47]  Stefano Soatto,et al.  Few-Shot Learning With Embedded Class Models and Shot-Free Meta Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Zheng Zhang,et al.  Negative Margin Matters: Understanding Margin in Few-shot Classification , 2020, ECCV.

[49]  Xu Jia,et al.  Learning to Select Base Classes for Few-Shot Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Zhuowen Tu,et al.  Attentional Constellation Nets for Few-Shot Learning , 2021, ICLR.

[51]  Stefano Soatto,et al.  A Baseline for Few-Shot Image Classification , 2019, ICLR.

[52]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[53]  Fei Sha,et al.  Few-Shot Learning via Embedding Adaptation With Set-to-Set Functions , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[56]  Jiebo Luo,et al.  Temperature network for few-shot learning with distribution-aware large-margin metric , 2021, Pattern Recognit..

[57]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.