Regularization Shortcomings for Continual Learning

In most machine learning algorithms, training data are assumed independent and identically distributed (iid). Otherwise, the algorithms' performances are challenged. A famous phenomenon with non-iid data distribution is known as \say{catastrophic forgetting}. Algorithms dealing with it are gathered in the \textit{Continual Learning} research field. In this article, we study the \textit{regularization} based approaches to continual learning. We show that those approaches can not learn to discriminate classes from different tasks in an elemental continual benchmark: class-incremental setting. We make theoretical reasoning to prove this shortcoming and illustrate it with examples and experiments.

[1]  Arthur Douillard,et al.  Continuum: Simple Management of Complex Continual Learning Scenarios , 2021, ArXiv.

[2]  Pierre Alquier,et al.  A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix , 2020, AISTATS.

[3]  Philip H. S. Torr,et al.  GDumb: A Simple Approach that Questions Our Progress in Continual Learning , 2020, ECCV.

[4]  Ethan Dyer,et al.  Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics , 2020, ICLR.

[5]  Tom Diethe,et al.  Optimal Continual Learning has Perfect Memory and is NP-hard , 2020, ICML.

[6]  Charles Ollion,et al.  PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning , 2020, ECCV.

[7]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[8]  Joelle Pineau,et al.  Online Learned Continual Compression with Adaptive Quantization Modules , 2019, ICML.

[9]  Joelle Pineau,et al.  Online Learned Continual Compression with Stacked Quantization Module , 2019, ArXiv.

[10]  Mehrdad Farajtabar,et al.  Orthogonal Gradient Descent for Continual Learning , 2019, AISTATS.

[11]  Matthias De Lange,et al.  Continual learning: A comparative study on how to defy forgetting in classification tasks , 2019, ArXiv.

[12]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[13]  David Filliat,et al.  DisCoRL: Continual Reinforcement Learning via Policy Distillation , 2019, ArXiv.

[14]  David Filliat,et al.  Continual Learning for Robotics , 2019, Inf. Fusion.

[15]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  David Filliat,et al.  Continual Reinforcement Learning deployed in Real-life using Policy Distillation and Sim2Real Transfer , 2019, ArXiv.

[18]  David Filliat,et al.  Generative Models from the perspective of Continual Learning , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[19]  Alex Lamb,et al.  Deep Learning for Classical Japanese Literature , 2018, ArXiv.

[20]  David Filliat,et al.  Marginal Replay vs Conditional Replay for Continual Learning , 2018, ICANN.

[21]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[22]  Adrian Popescu,et al.  DeeSIL: Deep-Shallow Incremental Learning , 2018, ECCV Workshops.

[23]  Vincenzo Lomonaco,et al.  Continuous Learning in Single-Incremental-Task Scenarios , 2018, Neural Networks.

[24]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[25]  David Barber,et al.  Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[26]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[27]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[28]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[29]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[30]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[31]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[32]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[33]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[34]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[37]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[38]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[39]  Xavier Glorot,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[40]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[41]  Eric T. Nalisnick,et al.  Under review as a conference paper at ICLR 2016 , 2015 .

[42]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .