论文信息 - DisCoRL: Continual Reinforcement Learning via Policy Distillation

DisCoRL: Continual Reinforcement Learning via Policy Distillation

In multi-task reinforcement learning there are two main challenges: at training time, the ability to learn different policies with a single model; at test time, inferring which of those policies applying without an external signal. In the case of continual reinforcement learning a third challenge arises: learning tasks sequentially without forgetting the previous ones. In this paper, we tackle these challenges by proposing DisCoRL, an approach combining state representation learning and policy distillation. We experiment on a sequence of three simulated 2D navigation tasks with a 3 wheel omni-directional robot. Moreover, we tested our approach's robustness by transferring the final policy into a real life setting. The policy can solve all tasks and automatically infer which one to run.

[1] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[2] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.

[4] Wojciech Zaremba,et al. Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[5] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.

[6] Christoph H. Lampert,et al. iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[8] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[9] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[10] Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[11] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[12] Jiwon Kim,et al. Continual Learning with Deep Generative Replay , 2017, NIPS.

[13] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14] Surya Ganguli,et al. Continual Learning Through Synaptic Intelligence , 2017, ICML.

[15] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.

[16] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[17] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.

[18] Bogdan Raducanu,et al. Memory Replay GANs: learning to generate images from new categories without forgetting , 2018, NeurIPS.

[19] Andrew J. Davison,et al. Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[20] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Tom Eccles,et al. Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies , 2018, NeurIPS.

[22] David Filliat,et al. S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning , 2018, ArXiv.

[23] Yee Whye Teh,et al. Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[24] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[25] Richard E. Turner,et al. Variational Continual Learning , 2017, ICLR.

[26] Pierre-Yves Oudeyer,et al. CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning , 2018, ICML 2019.

[27] David Filliat,et al. State Representation Learning for Control: An Overview , 2018, Neural Networks.

[28] Davide Maltoni,et al. Continuous Learning in Single-Incremental-Task Scenarios , 2018, Neural Networks.

[29] David Filliat,et al. Generative Models from the perspective of Continual Learning , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[30] Pierre-Yves Oudeyer,et al. CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning , 2018, ICML.

[31] David Filliat,et al. Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics , 2018, ArXiv.

[32] David Filliat,et al. Marginal Replay vs Conditional Replay for Continual Learning , 2018, ICANN.

[33] David Filliat,et al. Continual Learning for Robotics , 2019, Inf. Fusion.

[34] Natalia Díaz Rodríguez,et al. Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges , 2019, Inf. Fusion.

[35] E. Culurciello,et al. Continual Reinforcement Learning in 3D Non-stationary Environments , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36] David Filliat,et al. S-TRIGGER: Continual State Representation Learning via Self-Triggered Generative Replay , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).