Entropy-based Sample Selection for Online Continual Learning

Deep neural networks (DNNs) suffer from catastrophic forgetting, a rapid decrease in performance when trained on a sequence of tasks where only data of the most recent task is available. Most previous research has focused on the case where all data of a task is available simultaneously and boundaries between tasks are known. In this paper, we focus on the online setting where data arrives one-by-one or in small batches ordered by tasks and task boundaries are unknown. Avoiding catastrophic forgetting in such a setting is of great interest since it would allow DNNs to accumulate knowledge without the need to store all previously seen data even if task boundaries are unknown. For this, we propose a novel rehearsal algorithm for online continual learning that is derived from basic concepts of information theory. We demonstrate on commonly used data sets that our method can avoid catastrophic forgetting, achieve competitive results when compared with the current state-of-the-art and even outperform it in most cases.

[1]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[2]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[3]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[4]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[7]  S. Lewandowsky,et al.  Catastrophic interference in neural networks , 1995 .

[8]  Jon Lee Maximum entropy sampling , 2001 .

[9]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[10]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[11]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[12]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[13]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[14]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[15]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[16]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[17]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[18]  Yoshua Bengio,et al.  Towards Biologically Plausible Deep Learning , 2015, ArXiv.

[19]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[20]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[21]  Yen-Cheng Liu,et al.  Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines , 2018, ArXiv.