Policy Consolidation for Continual Reinforcement Learning

We propose a method for tackling catastrophic forgetting in deep reinforcement learning that is \textit{agnostic} to the timescale of changes in the distribution of experiences, does not require knowledge of task boundaries, and can adapt in \textit{continuously} changing environments. In our \textit{policy consolidation} model, the policy network interacts with a cascade of hidden networks that simultaneously remember the agent's policy at a range of timescales and regularise the current policy by its own history, thereby improving its ability to learn without forgetting. We find that the model improves continual learning relative to baselines on a number of continuous control tasks in single-task, alternating two-task, and multi-agent competitive self-play settings.

[1]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[2]  David Isele,et al.  Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[3]  Laurent Itti,et al.  Active Long Term Memory Networks , 2016, ArXiv.

[4]  Gregory Dudek,et al.  Benchmark Environments for Multitask Learning in Continuous Domains , 2017, ArXiv.

[5]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[6]  Razvan Pascanu,et al.  Memory-based Parameter Adaptation , 2018, ICLR.

[7]  Stefano Fusi,et al.  Computational principles of synaptic memory consolidation , 2016, Nature Neuroscience.

[8]  Honglak Lee,et al.  Online Incremental Feature Learning with Denoising Autoencoders , 2012, AISTATS.

[9]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[10]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[11]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[12]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[13]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[14]  C. Shea,et al.  Principles derived from the study of simple skills do not generalize to complex skill learning , 2002, Psychonomic bulletin & review.

[15]  Murray Shanahan,et al.  Continual Reinforcement Learning with Complex Synapses , 2018, ICML.

[16]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[17]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[18]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[19]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[20]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[21]  John R. Anderson Learning and memory: An integrated approach, 2nd ed. , 2000 .

[22]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[24]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[25]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[26]  Zhanxing Zhu,et al.  Reinforced Continual Learning , 2018, NeurIPS.

[27]  Joel Veness,et al.  The Forget-me-not Process , 2016, NIPS.

[28]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[29]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[30]  Mark B. Ring Child: A First Step Towards Continual Learning , 1998, Learning to Learn.

[31]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.