Continual Few-shot Learning with Transformer Adaptation and Knowledge Regularization

Continual few-shot learning, as a paradigm that simultaneously solves continual learning and few-shot learning, has become a challenging problem in machine learning. An eligible continual few-shot learning model is expected to distinguish all seen classes upon new categories arriving, where each category only includes very few labeled data. However, existing continual few-shot learning methods only consider the visual modality, where the distributions of new categories often indistinguishably overlap with old categories, thus resulting in the severe catastrophic forgetting problem. To tackle this problem, in this paper we study continual few-shot learning with the assistance of semantic knowledge by simultaneously taking both visual modality and semantic concepts of categories into account. We propose a Continual few-shot learning algorithm with Semantic knowledge Regularization (CoSR) for adapting to the distribution changes of visual prototypes through a Transformer-based prototype adaptation mechanism. Specifically, the original visual prototypes from the backbone are fed into the well-designed Transformer with corresponding semantic concepts, where the semantic concepts are extracted from all categories. The semantic-level regularization forces the categories with similar semantics to be closely distributed, while the opposite ones are constrained to be far away from each other. The semantic regularization improves the model’s ability to distinguish between new and old categories, thus significantly mitigating the catastrophic forgetting problem in continual few-shot learning. Extensive experiments on CIFAR100, miniImageNet, CUB200 and an industrial dataset with long-tail distribution demonstrate the advantages of our CoSR model compared with state-of-the-art methods.

[1]  Xin Wang,et al.  Continual Recognition with Adaptive Memory Update , 2022, ACM Trans. Multim. Comput. Commun. Appl..

[2]  Xi Li,et al.  MgSvF: Multi-Grained Slow versus Fast Framework for Few-Shot Class-Incremental Learning , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Xiao-Ming Wu,et al.  Overcoming Catastrophic Forgetting in Incremental Few-Shot Learning by Finding Flat Minima , 2021, NeurIPS.

[4]  Afra Feyza Akyurek,et al.  Subspace Regularizers for Few-Shot Class Incremental Learning , 2021, ICLR.

[5]  Bernt Schiele,et al.  Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Yudong Chen,et al.  Curriculum Meta-Learning for Next POI Recommendation , 2021, KDD.

[7]  Zheng-Jun Zha,et al.  Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yihong Gong,et al.  Few-Shot Class-Incremental Learning via Relation Knowledge Distillation , 2021, AAAI.

[9]  Yinghui Xu,et al.  Few-Shot Incremental Learning with Continually Evolved Classifiers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  M. Harandi,et al.  Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Piyush Rai,et al.  Few-Shot Lifelong Learning , 2021, AAAI.

[12]  Matthias De Lange,et al.  Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Xiaopeng Hong,et al.  Topology-Preserving Class-Incremental Learning , 2020, ECCV.

[14]  Xiaopeng Hong,et al.  Few-Shot Class-Incremental Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jaekyun Moon,et al.  XtarNet: Learning to Extract Task-Adaptive Representation for Incremental Few-Shot Learning , 2020, ICML.

[16]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Quanming Yao,et al.  Few-shot Learning: A Survey , 2019, ArXiv.

[18]  Renjie Liao,et al.  Incremental Few-Shot Learning with Attention Attractor Networks , 2018, NeurIPS.

[19]  Shih-Fu Chang,et al.  Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks , 2018, NeurIPS.

[20]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[21]  Rogério Schmidt Feris,et al.  Delta-encoder: an effective sample synthesis method for few-shot object recognition , 2018, NeurIPS.

[22]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[23]  Svetlana Lazebnik,et al.  Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.

[24]  Seungjin Choi,et al.  Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.

[25]  Martial Hebert,et al.  Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Alexandros Karatzoglou,et al.  Overcoming catastrophic forgetting with hard attention to the task , 2018, ICML.

[27]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Wenwu Zhu,et al.  Learning to Learn Image Classifiers With Visual Analogy , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[31]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[32]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[33]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[34]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[35]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[36]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[37]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Martial Hebert,et al.  Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[40]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[41]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[44]  Kuilin Chen,et al.  Incremental few-shot learning via vector quantization in deep embedded space , 2021, ICLR.

[45]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[46]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[47]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .