Continual Few-Shot Learning Using HyperTransformers

We focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from a continual sequence of tasks, we propose to recursively re-use the generated weights as input to the HT for the next task. This way, the generated CNN weights themselves act as a representation of previously learned tasks, and the HT is trained to update these weights so that the new task can be learned without forgetting past tasks. This approach is different from most continual learning algorithms that typically rely on using replay buffers, weight regularization or task-dependent architectural changes. We demonstrate that our proposed Continual HyperTransformer method equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for a variety of scenarios, including learning from mini-batches, and task-incremental and class-incremental learning scenarios.

[1]  Kevin J Liang,et al.  Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  M. Sandler,et al.  HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning , 2022, ICML.

[3]  Xiao-Ming Wu,et al.  Overcoming Catastrophic Forgetting in Incremental Few-Shot Learning by Finding Flat Minima , 2021, NeurIPS.

[4]  Eugene Lee,et al.  Few-Shot and Continual Learning with Attentive Independent Mechanisms , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Yu Wang,et al.  Few-Shot Continual Learning for Audio Classification , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Jun Zhu,et al.  Few-shot Continual Learning: a Brain-inspired Approach , 2021, ArXiv.

[7]  Yinghui Xu,et al.  Few-Shot Incremental Learning with Continually Evolved Classifiers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Piyush Rai,et al.  Few-Shot Lifelong Learning , 2021, AAAI.

[9]  Kuilin Chen,et al.  Incremental few-shot learning via vector quantization in deep embedded space , 2021, ICLR.

[10]  Magdalena Biesialska,et al.  Continual Lifelong Learning in Natural Language Processing: A Survey , 2020, COLING.

[11]  Iryna Gurevych,et al.  AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.

[12]  Amos Storkey,et al.  Defining Benchmarks for Continual Few-Shot Learning , 2020, ArXiv.

[13]  Tao Xiang,et al.  Incremental Few-Shot Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Dustin Tran,et al.  BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning , 2020, ICLR.

[15]  Benjamin F. Grewe,et al.  Continual learning with hypernetworks , 2019, ICLR.

[16]  Jaya Krishna Mandivarapu,et al.  Self-Net: Lifelong Learning via Continual Self-Modeling , 2018, Frontiers in Artificial Intelligence.

[17]  Gunshi Gupta,et al.  Look-ahead Meta Learning for Continual Learning , 2020, NeurIPS.

[18]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[19]  Amos J. Storkey,et al.  How to train your MAML , 2018, ICLR.

[20]  Renjie Liao,et al.  Incremental Few-Shot Learning with Attention Attractor Networks , 2018, NeurIPS.

[21]  Gerald Tesauro,et al.  Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.

[22]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[23]  Yen-Cheng Liu,et al.  Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines , 2018, ArXiv.

[24]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[25]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  J. Schulman,et al.  Reptile: a Scalable Metalearning Algorithm , 2018 .

[27]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[30]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[31]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[32]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[33]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[34]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[35]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[36]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[37]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .