New Insights on Reducing Abrupt Representation Change in Online Continual Learning

In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. Experience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones. We shed new light on this question by showing that applying ER causes the newly added classes' representations to overlap significantly with the previous classes, leading to highly disruptive parameter updates. Based on this empirical analysis, we propose a new method which mitigates this issue by shielding the learned representations from drastic adaptation to accommodate new classes. We show that using an asymmetric update rule pushes new classes to adapt to the older ones (rather than the reverse), which is more effective especially at task boundaries, where much of the forgetting typically occurs. Empirical results show significant gains over strong baselines on standard continual learning benchmarks.

[1]  Massimo Caccia,et al.  Continual Learning via Local Module Composition , 2021, NeurIPS.

[2]  J. Oswald,et al.  Learning where to learn: Gradient sparsity in meta and continual learning , 2021, NeurIPS.

[3]  Massimo Caccia,et al.  Understanding Continual Learning Settings with Data Distribution Drift Analysis , 2021, ArXiv.

[4]  Scott Sanner,et al.  Supervised Contrastive Replay: Revisiting the Nearest Class Mean Classifier in Online Class-Incremental Continual Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Scott Sanner,et al.  Online Class-Incremental Continual Learning with Adversarial Shapley Value , 2020, AAAI.

[6]  Philip H. S. Torr,et al.  GDumb: A Simple Approach that Questions Our Progress in Continual Learning , 2020, ECCV.

[7]  David S. Matteson,et al.  Graph-Based Continual Learning , 2020, ICLR.

[8]  Tinne Tuytelaars,et al.  Automatic Recall Machines: Internal Replay, Continual Learning and the Brain , 2020, ArXiv.

[9]  Andreas Krause,et al.  Coresets via Bilevel Optimization for Continual Learning and Streaming , 2020, NeurIPS.

[10]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[11]  S. Calderara,et al.  Dark Experience for General Continual Learning: a Strong, Simple Baseline , 2020, NeurIPS.

[12]  Taesup Moon,et al.  SS-IL: Separated Softmax for Incremental Learning , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Min Lin,et al.  Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual Learning , 2020, ArXiv.

[14]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[15]  David Filliat,et al.  Regularization Shortcomings for Continual Learning , 2019, ArXiv.

[16]  Shutao Xia,et al.  Maintaining Discrimination and Fairness in Class Incremental Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Matthias De Lange,et al.  Continual learning: A comparative study on how to defy forgetting in classification tasks , 2019, ArXiv.

[19]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[20]  Yee Whye Teh,et al.  Task Agnostic Continual Learning via Meta Learning , 2019, ArXiv.

[21]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Andreas S. Tolias,et al.  Three scenarios for continual learning , 2019, ArXiv.

[24]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[25]  Marc'Aurelio Ranzato,et al.  Continual Learning with Tiny Episodic Memories , 2019, ArXiv.

[26]  Tinne Tuytelaars,et al.  Task-Free Continual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[28]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[29]  Yarin Gal,et al.  Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[30]  Elad Hoffer,et al.  Task Agnostic Continual Learning Using Online Variational Bayes , 2018, 1803.10123.

[31]  Alexandros Karatzoglou,et al.  Overcoming catastrophic forgetting with hard attention to the task , 2018, ICML.

[32]  Matthew A. Brown,et al.  Low-Shot Learning with Imprinted Weights , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[34]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[35]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[38]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[39]  Min Sun,et al.  Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization , 2020, NeurIPS.

[40]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .