Reducing Representation Drift in Online Continual Learning

In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. Previous work in this setting often tries to reduce catastrophic forgetting by limiting changes in the space of model parameters. In this work we instead focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones. Starting from a popular approach, experience replay, we consider metric learning based loss functions which, when adjusted to appropriately select negative samples, can effectively nudge the learned representations to be more robust to new future classes. We show that this selection of negatives is in fact critical for reducing representation drift of previously observed data. Motivated by this we further introduce a simple adjustment to the standard cross entropy loss used in prior experience replay that achieves similar effect. Our approach directly improves the performance of experience replay for this setting, obtaining state-of-the-art results on several existing benchmarks in online continual learning, while remaining efficient in both memory and compute. We release an implementation of our experiments at https://github.com/naderAsadi/AML .

[1]  Mehrdad Farajtabar,et al.  The Effectiveness of Memory Replay in Large Scale Continual Learning , 2020, ArXiv.

[2]  Scott Sanner,et al.  Online Class-Incremental Continual Learning with Adversarial Shapley Value , 2020, AAAI.

[3]  Philip H. S. Torr,et al.  GDumb: A Simple Approach that Questions Our Progress in Continual Learning , 2020, ECCV.

[4]  V. Koltun,et al.  Drinking From a Firehose: Continual Learning With Web-Scale Natural Language , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Tinne Tuytelaars,et al.  Automatic Recall Machines: Internal Replay, Continual Learning and the Brain , 2020, ArXiv.

[6]  Andreas Krause,et al.  Coresets via Bilevel Optimization for Continual Learning and Streaming , 2020, NeurIPS.

[7]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[8]  Simone Calderara,et al.  Dark Experience for General Continual Learning: a Strong, Simple Baseline , 2020, NeurIPS.

[9]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[10]  Shutao Xia,et al.  Maintaining Discrimination and Fairness in Class Incremental Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Mehrdad Farajtabar,et al.  Orthogonal Gradient Descent for Continual Learning , 2019, AISTATS.

[13]  Matthias De Lange,et al.  Continual learning: A comparative study on how to defy forgetting in classification tasks , 2019, ArXiv.

[14]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[15]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[18]  Marc'Aurelio Ranzato,et al.  Continual Learning with Tiny Episodic Memories , 2019, ArXiv.

[19]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[20]  Elad Hoffer,et al.  Task Agnostic Continual Learning Using Online Variational Bayes , 2018, 1803.10123.

[21]  Razvan Pascanu,et al.  Memory-based Parameter Adaptation , 2018, ICLR.

[22]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[23]  Matthew A. Brown,et al.  Low-Shot Learning with Imprinted Weights , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[25]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[26]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[29]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[30]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.