DisCoRL: Continual Reinforcement Learning via Policy Distillation

In multi-task reinforcement learning there are two main challenges: at training time, the ability to learn different policies with a single model; at test time, inferring which of those policies applying without an external signal. In the case of continual reinforcement learning a third challenge arises: learning tasks sequentially without forgetting the previous ones. In this paper, we tackle these challenges by proposing DisCoRL, an approach combining state representation learning and policy distillation. We experiment on a sequence of three simulated 2D navigation tasks with a 3 wheel omni-directional robot. Moreover, we tested our approach's robustness by transferring the final policy into a real life setting. The policy can solve all tasks and automatically infer which one to run.

[1]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[2]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[4]  Wojciech Zaremba,et al.  Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[5]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[6]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[8]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[9]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[10]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[11]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[12]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[13]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[15]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[16]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[17]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[18]  Bogdan Raducanu,et al.  Memory Replay GANs: learning to generate images from new categories without forgetting , 2018, NeurIPS.

[19]  Andrew J. Davison,et al.  Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[20]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Tom Eccles,et al.  Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies , 2018, NeurIPS.

[22]  David Filliat,et al.  S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning , 2018, ArXiv.

[23]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[24]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[25]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[26]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning , 2018, ICML 2019.

[27]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[28]  Davide Maltoni,et al.  Continuous Learning in Single-Incremental-Task Scenarios , 2018, Neural Networks.

[29]  David Filliat,et al.  Generative Models from the perspective of Continual Learning , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[30]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning , 2018, ICML.

[31]  David Filliat,et al.  Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics , 2018, ArXiv.

[32]  David Filliat,et al.  Marginal Replay vs Conditional Replay for Continual Learning , 2018, ICANN.

[33]  David Filliat,et al.  Continual Learning for Robotics , 2019, Inf. Fusion.

[34]  Natalia Díaz Rodríguez,et al.  Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges , 2019, Inf. Fusion.

[35]  E. Culurciello,et al.  Continual Reinforcement Learning in 3D Non-stationary Environments , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  David Filliat,et al.  S-TRIGGER: Continual State Representation Learning via Self-Triggered Generative Replay , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).