Incremental Meta-Learning via Episodic Replay Distillation for Few-Shot Image Recognition

Most meta-learning approaches assume the existence of a very large set of labeled data available for episodic metalearning of base knowledge. This contrasts with the more realistic continual learning paradigm in which data arrives incrementally in the form of tasks containing disjoint classes. In this paper we consider this problem of Incremental MetaLearning (IML) in which classes are presented incrementally in discrete tasks. We propose an approach to IML, which we call Episodic Replay Distillation (ERD), that mixes classes from the current task with class exemplars from previous tasks when sampling episodes for meta-learning. These episodes are then used for knowledge distillation to minimize catastrophic forgetting. Experiments on four datasets demonstrate that ERD surpasses the state-of-the-art. In particular, on the more challenging one-shot, long task sequence incremental meta-learning scenarios, we reduce the gap between IML and the joint-training upper bound from 3.5% / 10.1% / 13.4% with the current state-of-the-art to 2.6% / 2.9% / 5.0% with our method on Tiered-ImageNet / Mini-ImageNet / CIFAR100, respectively. Introduction Meta-learning, also commonly referred to as “learning to learn”, is a learning paradigm in which a model gains experience over a sequence of learning episodes.1 This experience is optimized so as to improve the model’s future learning performance on unseen tasks (Hospedales et al. 2021). Meta-learning is one of the most promising techniques to learning models that can flexibly generalize, like humans, to new tasks and environments not seen during training. This capability is generally considered to be crucial for future AI systems. Few-shot learning has emerged as the paradigm-ofchoice to test and evaluate meta-learning algorithms. It aims to learn from very limited numbers of samples (as few as just one), and meta-learning applied to few-shot image recognition in particular has attracted increased attention in recent years (Su, Maji, and Hariharan 2020; Bateni et al. 2020; Li et al. 2020; Yang, Liu, and Xu 2021). However, most few-shot learning methods are limited in their learning modes: they must train with a large number of To avoid ambiguities, we use the term episode in the sense used in meta-learning rather than how it is used in continual learning. We use task in the sense of continual learning to refer to a disjoint group of new classes. Meta-test data Episode 1 Episode 2 Episode n ··· Model (init) Exem. mem. (empty) Episode 3 Model (t=3) Task 3 data

[1]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[2]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[3]  Min Lin,et al.  Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual Learning , 2020, ArXiv.

[4]  Leonid Sigal,et al.  Improved Few-Shot Visual Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[7]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[9]  Gunshi Gupta,et al.  La-MAML: Look-ahead Meta Learning for Continual Learning , 2020, NeurIPS.

[10]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Bogdan Raducanu,et al.  Generative Feature Replay For Class-Incremental Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Xiaopeng Hong,et al.  Few-Shot Class-Incremental Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[16]  Stefano Soatto,et al.  Incremental Few-Shot Meta-learning via Indirect Discriminant Alignment , 2020, ECCV.

[17]  Min Xu,et al.  Free Lunch for Few-shot Learning: Distribution Calibration , 2021, ICLR.

[18]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Sung Whan Yoon,et al.  XtarNet: Learning to Extract Task-Adaptive Representation for Incremental Few-Shot Learning , 2020, ICML.

[20]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[21]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[22]  Bogdan Raducanu,et al.  Memory Replay GANs: Learning to Generate New Categories without Forgetting , 2018, NeurIPS.

[23]  Christopher Kanan,et al.  REMIND Your Neural Network to Prevent Catastrophic Forgetting , 2020, ECCV.

[24]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[25]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[26]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Matthieu Cord,et al.  PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning , 2020, ECCV.

[28]  Amos Storkey,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE transactions on pattern analysis and machine intelligence.

[29]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[30]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.