Regularization Shortcomings for Continual Learning

In most machine learning algorithms, training data are assumed independent and identically distributed (iid). Otherwise, the algorithms' performances are challenged. A famous phenomenon with non-iid data distribution is known as \say{catastrophic forgetting}. Algorithms dealing with it are gathered in the \textit{Continual Learning} research field. In this article, we study the \textit{regularization} based approaches to continual learning. We show that those approaches can not learn to discriminate classes from different tasks in an elemental continual benchmark: class-incremental setting. We make theoretical reasoning to prove this shortcoming and illustrate it with examples and experiments.

[1]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  David Barber,et al.  Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[3]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[4]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[5]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[6]  David Filliat,et al.  Generative Models from the perspective of Continual Learning , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[7]  Davide Maltoni,et al.  Continuous Learning in Single-Incremental-Task Scenarios , 2018, Neural Networks.

[8]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[9]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Joelle Pineau,et al.  Online Learned Continual Compression with Stacked Quantization Module , 2019, ICML 2020.

[11]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[12]  David Filliat,et al.  Marginal Replay vs Conditional Replay for Continual Learning , 2018, ICANN.

[13]  Alex Lamb,et al.  Deep Learning for Classical Japanese Literature , 2018, ArXiv.

[14]  Bogdan Raducanu,et al.  Memory Replay GANs: Learning to Generate New Categories without Forgetting , 2018, NeurIPS.

[15]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[16]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[17]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[18]  David Filliat,et al.  Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges , 2020, Inf. Fusion.

[19]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Adrian Popescu,et al.  DeeSIL: Deep-Shallow Incremental Learning , 2018, ECCV Workshops.

[21]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[22]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[23]  Joelle Pineau,et al.  Online Learned Continual Compression with Adaptive Quantization Modules , 2019, ICML.

[24]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[25]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[27]  David Filliat,et al.  DisCoRL: Continual Reinforcement Learning via Policy Distillation , 2019, ArXiv.

[28]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[29]  David Filliat,et al.  Continual Reinforcement Learning deployed in Real-life using Policy Distillation and Sim2Real Transfer , 2019, ArXiv.

[30]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[31]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.