Continual Reinforcement Learning with Multi-Timescale Replay

In this paper, we propose a multi-timescale replay (MTR) buffer for improving continual learning in RL agents faced with environments that are changing continuously over time at timescales that are unknown to the agent. The basic MTR buffer comprises a cascade of sub-buffers that accumulate experiences at different timescales, enabling the agent to improve the trade-off between adaptation to new data and retention of old knowledge. We also combine the MTR framework with invariant risk minimization, with the idea of encouraging the agent to learn a policy that is robust across the various environments it encounters over time. The MTR methods are evaluated in three different continual learning settings on two continuous control tasks and, in many cases, show improvement over the baselines.

[1]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[2]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[3]  Jens Kober,et al.  Off-policy experience retention for deep actor-critic learning , 2016, NIPS 2016.

[4]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[5]  Karl Tuyls,et al.  The importance of experience replay database composition in deep reinforcement learning , 2015 .

[6]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[7]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[8]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[9]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[10]  Richard S. Sutton,et al.  A Deeper Look at Experience Replay , 2017, ArXiv.

[11]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[12]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[13]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[14]  Murray Shanahan,et al.  Policy Consolidation for Continual Reinforcement Learning , 2019, ICML.

[15]  Che Wang,et al.  Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past , 2019, ArXiv.

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[18]  Murray Shanahan,et al.  Continual Reinforcement Learning with Complex Synapses , 2018, ICML.

[19]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[20]  David Isele,et al.  Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[21]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[22]  J. Wixted,et al.  On the Form of Forgetting , 1991 .

[23]  D. Rubin,et al.  One Hundred Years of Forgetting : A Quantitative Description of Retention , 1996 .

[24]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[25]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[26]  Dawn Xiaodong Song,et al.  Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[27]  Marc'Aurelio Ranzato,et al.  Continual Learning with Tiny Episodic Memories , 2019, ArXiv.

[28]  Marc'Aurelio Ranzato,et al.  On Tiny Episodic Memories in Continual Learning , 2019 .

[29]  Michael J. Kahana,et al.  Note on the power law of forgetting , 2017, bioRxiv.

[30]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[32]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.