Online Coreset Selection for Rehearsal-based Continual Learning

A dataset is a shred of crucial evidence to describe a task. However, each data point in the dataset does not have the same potential, as some of the data points can be more representative or informative than others. This unequal importance among the data points may have a large impact in rehearsal-based continual learning, where we store a subset of the training examples (coreset) to be replayed later to alleviate catastrophic forgetting. In continual learning, the quality of the samples stored in the coreset directly affects the model’s effectiveness and efficiency. The coreset selection problem becomes even more important under realistic settings, such as imbalanced continual learning or noisy data scenarios. To tackle this problem, we propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration and trains them in an online manner. Our proposed method maximizes the model’s adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting. We validate the effectiveness of our coreset selection mechanism over various standard, imbalanced, and noisy datasets against strong continual learning baselines, demonstrating that it improves task adaptation and prevents catastrophic forgetting in a sample-efficient manner.

[1]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2]  Richard Socher,et al.  Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting , 2019, ICML.

[3]  Konstantinos Kamnitsas,et al.  Towards continual learning in medical imaging , 2018, ArXiv.

[4]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[5]  Seyed Iman Mirzadeh,et al.  Understanding the Role of Training Regimes in Continual Learning , 2020, NeurIPS.

[6]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Trevor Campbell,et al.  Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..

[8]  Yee Whye Teh,et al.  Functional Regularisation for Continual Learning using Gaussian Processes , 2019, ICLR.

[9]  François Fleuret,et al.  Not All Samples Are Created Equal: Deep Learning with Importance Sampling , 2018, ICML.

[10]  Megha Nawhal,et al.  Lifelong GAN: Continual Learning for Conditional Image Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Axel Saalbach,et al.  Continual Learning for Domain Adaptation in Chest X-ray Classification , 2020, MIDL.

[12]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[13]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[14]  Seyed Iman Mirzadeh,et al.  Linear Mode Connectivity in Multitask and Continual Learning , 2020, ICLR.

[15]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[18]  Zhanxing Zhu,et al.  Reinforced Continual Learning , 2018, NeurIPS.

[19]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[21]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[22]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[23]  Tyler B. Johnson,et al.  Training Deep Models Faster with Robust, Approximate Importance Sampling , 2018, NeurIPS.

[24]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[25]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[28]  Stefano Ermon,et al.  Experience Replay with Likelihood-free Importance Weights , 2020, L4DC.

[29]  Johannes Stallkamp,et al.  The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.

[30]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[31]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Sebastian Thrun,et al.  A lifelong learning perspective for mobile robot control , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[33]  Hynek Hermansky,et al.  Continual Learning in Automatic Speech Recognition , 2020, INTERSPEECH.

[34]  Kenneth Ward Church,et al.  Compositional Language Continual Learning , 2019, ICLR.

[35]  Magdalena Biesialska,et al.  Continual Lifelong Learning in Natural Language Processing: A Survey , 2020, COLING.

[36]  Max Welling,et al.  Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement , 2019, ICML.

[37]  Marc'Aurelio Ranzato,et al.  Continual Learning with Tiny Episodic Memories , 2019, ArXiv.

[38]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[39]  Aaron Y. Lee,et al.  Clinical applications of continual learning machine learning. , 2020, The Lancet. Digital health.

[40]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[41]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[42]  Eunho Yang,et al.  Federated Continual Learning with Weighted Inter-client Transfer , 2021, ICML.

[43]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[44]  Andreas Krause,et al.  Coresets via Bilevel Optimization for Continual Learning and Streaming , 2020, NeurIPS.

[45]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .