Prototype-Sample Relation Distillation: Towards Replay-Free Continual Learning

In Continual learning (CL) balancing effective adaptation while combating catastrophic forgetting is a central challenge. Many of the recent best-performing methods utilize various forms of prior task data, e.g. a replay buffer, to tackle the catastrophic forgetting problem. Having access to previous task data can be restrictive in many real-world scenarios, for example when task data is sensitive or proprietary. To overcome the necessity of using previous tasks' data, in this work, we start with strong representation learning methods that have been shown to be less prone to forgetting. We propose a holistic approach to jointly learn the representation and class prototypes while maintaining the relevance of old class prototypes and their embedded similarities. Specifically, samples are mapped to an embedding space where the representations are learned using a supervised contrastive loss. Class prototypes are evolved continually in the same latent space, enabling learning and prediction at any point. To continually adapt the prototypes without keeping any prior task data, we propose a novel distillation loss that constrains class prototypes to maintain relative similarities as compared to new task data. This method yields state-of-the-art performance in the task-incremental setting, outperforming methods relying on large amounts of data, and provides strong performance in the class-incremental setting without using any stored data points.

[1]  Matthias De Lange,et al.  CLAD: A realistic Continual Learning benchmark for Autonomous Driving , 2022, Neural Networks.

[2]  A. Bimbo,et al.  Contrastive Supervised Distillation for Continual Representation Learning , 2022, ICIAP.

[3]  S. Mudur,et al.  Probing Representation Forgetting in Supervised and Unsupervised Continual Learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  S. Mudur,et al.  Tackling Online One-Class Incremental Learning by Removing Negative Contrasts , 2022, ArXiv.

[5]  S. Gong,et al.  Striking a Balance between Stability and Plasticity for Class-Incremental Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  R. Sutton,et al.  Continual Backprop: Stochastic Gradient Descent with Persistent Randomness , 2021, ArXiv.

[7]  Jinwoo Shin,et al.  Co2L: Contrastive Continual Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Fei Yin,et al.  Prototype Augmentation and Self-Supervision for Incremental Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  T. Tuytelaars,et al.  New Insights on Reducing Abrupt Representation Change in Online Continual Learning , 2021, International Conference on Learning Representations.

[10]  Aijun Yang,et al.  Complementary Relation Contrastive Distillation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Matthias De Lange,et al.  Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Tinne Tuytelaars,et al.  Automatic Recall Machines: Internal Replay, Continual Learning and the Brain , 2020, ArXiv.

[13]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[14]  Joost van de Weijer,et al.  Semantic Drift Compensation for Class-Incremental Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Taesup Moon,et al.  SS-IL: Separated Softmax for Incremental Learning , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Ismail Ben Ayed,et al.  A Unifying Mutual Information View of Metric Learning: Cross-Entropy vs. Pairwise Losses , 2020, ECCV.

[17]  Min Lin,et al.  Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual Learning , 2020, ArXiv.

[18]  Yonglong Tian,et al.  Contrastive Representation Distillation , 2019, ICLR.

[19]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[20]  Yan Lu,et al.  Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Richard Socher,et al.  Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting , 2019, ICML.

[22]  Marc'Aurelio Ranzato,et al.  Continual Learning with Tiny Episodic Memories , 2019, ArXiv.

[23]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[24]  Faisal Shafait,et al.  Revisiting Distillation and Incremental Classifier Learning , 2018, ACCV.

[25]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[26]  Frank Hutter,et al.  A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.

[27]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[28]  John K. Tsotsos,et al.  Incremental Learning Through Deep Adaptation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[30]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[36]  Martial Mermillod,et al.  The stability-plasticity dilemma: investigating the continuum from catastrophic forgetting to age-limited learning effects , 2013, Front. Psychol..

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  I. J. Myung,et al.  Tutorial on maximum likelihood estimation , 2003 .

[40]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[41]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .