Fractional Transfer Learning for Deep Model-Based Reinforcement Learning

Reinforcement learning (RL) is well known for requiring large amounts of data in order for RL agents to learn to perform complex tasks. Recent progress in modelbased RL allows agents to be much more data-efficient, as it enables them to learn behaviors of visual environments in imagination by leveraging an internal World Model of the environment. Improved sample efficiency can also be achieved by reusing knowledge from previously learned tasks, but transfer learning is still a challenging topic in RL. Parameter-based transfer learning is generally done using an all-or-nothing approach, where the network’s parameters are either fully transferred or randomly initialized. In this work we present a simple alternative approach: fractional transfer learning. The idea is to transfer fractions of knowledge, opposed to discarding potentially useful knowledge as is commonly done with random initialization. Using the World Model-based Dreamer algorithm, we identify which type of components this approach is applicable to, and perform experiments in a new multi-source transfer learning setting. The results show that fractional transfer learning often leads to substantially improved performance and faster learning compared to learning from scratch and random initialization.

[1]  Pieter Abbeel,et al.  Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings , 2021, NeurIPS.

[2]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[3]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[4]  Tengyu Ma,et al.  A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning , 2019, ArXiv.

[5]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[6]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Walter Daelemans,et al.  Deep Transfer Learning for Art Classification Problems , 2018, ECCV Workshops.

[10]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[11]  Dorsa Sadigh,et al.  Transfer Reinforcement Learning Across Homotopy Classes , 2021, IEEE Robotics and Automation Letters.

[12]  Manuela M. Veloso,et al.  Reusing and Building a Policy Library , 2006, ICAPS.

[13]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[14]  Jiayu Zhou,et al.  Transfer Learning in Deep Reinforcement Learning: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Lakhmi C. Jain,et al.  Recurrent Neural Networks: Design and Applications , 1999 .

[16]  Joelle Pineau,et al.  Decoupling Dynamics and Reward for Transfer Learning , 2018, ICLR.

[17]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[18]  Ruslan Salakhutdinov,et al.  Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers , 2020, ArXiv.

[19]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[20]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[21]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[22]  Freek Stulp,et al.  Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics , 2020, ArXiv.

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Thomas B. Schön,et al.  Learning deep dynamical models from image pixels , 2014, ArXiv.

[25]  James L. Carroll,et al.  Fixed vs. Dynamic Sub-Transfer in Reinforcement Learning , 2002, ICMLA.

[26]  Mohammad Norouzi,et al.  Mastering Atari with Discrete World Models , 2020, ICLR.

[27]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[28]  Gillian M. Hayes,et al.  Estimating Future Reward in Reinforcement Learning Animats using Associative Learning , 2004 .

[29]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[30]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[31]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Wenlong Fu,et al.  Model-based reinforcement learning: A survey , 2018 .

[33]  Gerald Tesauro,et al.  TD-Gammon: A Self-Teaching Backgammon Program , 1995 .

[34]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[37]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[38]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[39]  Hui Xiong,et al.  A Comprehensive Survey on Transfer Learning , 2019, Proceedings of the IEEE.

[40]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[41]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[42]  Marcello Restelli,et al.  Importance Weighted Transfer of Samples in Reinforcement Learning , 2018, ICML.

[43]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[44]  Dana H. Ballard,et al.  Modular Learning in Neural Networks , 1987, AAAI.

[45]  R. Bellman A Markovian Decision Process , 1957 .

[46]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.