Experience Replay for Continual Learning

Continual learning is the problem of learning new tasks or knowledge while protecting old knowledge and ideally generalizing from old experience to learn new tasks faster. Neural networks trained by stochastic gradient descent often degrade on old tasks when trained successively on new tasks with different data distributions. This phenomenon, referred to as catastrophic forgetting, is considered a major hurdle to learning with non-stationary data or sequences of new tasks, and prevents networks from continually accumulating knowledge and skills. We examine this issue in the context of reinforcement learning, in a setting where an agent is exposed to tasks in a sequence. Unlike most other work, we do not provide an explicit indication to the model of task boundaries, which is the most general circumstance for a learning agent exposed to continuous experience. While various methods to counteract catastrophic forgetting have recently been proposed, we explore a straightforward, general, and seemingly overlooked solution - that of using experience replay buffers for all past events - with a mixture of on- and off-policy learning, leveraging behavioral cloning. We show that this strategy can still learn new tasks quickly yet can substantially reduce catastrophic forgetting in both Atari and DMLab domains, even matching the performance of methods that require task identities. When buffer storage is constrained, we confirm that a simple mechanism for randomly discarding data allows a limited size buffer to perform almost as well as an unbounded one.

[1]  S. Grossberg,et al.  How does a brain build a cognitive code? , 1980, Psychological review.

[2]  James L. McClelland Complementary Learning Systems in the Brain: A Connectionist Approach to Explicit and Implicit Cognition and Memory , 1998, Annals of the New York Academy of Sciences.

[3]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[4]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[5]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Joel Veness,et al.  The Forget-me-not Process , 2016, NIPS.

[8]  Koray Kavukcuoglu,et al.  PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.

[9]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[10]  Stefano Fusi,et al.  Computational principles of synaptic memory consolidation , 2016, Nature Neuroscience.

[11]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[12]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[13]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[14]  Koray Kavukcuoglu,et al.  Combining policy gradient and Q-learning , 2016, ICLR.

[15]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[16]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[17]  Richard E. Turner,et al.  Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.

[18]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[19]  Murray Shanahan,et al.  Continual Reinforcement Learning with Complex Synapses , 2018, ICML.

[20]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[21]  David Isele,et al.  Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[22]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[24]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[25]  Murray Shanahan,et al.  Policy Consolidation for Continual Reinforcement Learning , 2019, ICML.

[26]  Nathan D. Cahill,et al.  Memory Efficient Experience Replay for Streaming Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[27]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[28]  Michael H. Herzog,et al.  Synaptic weight decay with selective consolidation enables fast learning without catastrophic forgetting , 2019, bioRxiv.

[29]  Laurent Itti,et al.  Closed-Loop GAN for continual Learning , 2018, IJCAI.