Efficient transfer learning and online adaptation with latent variable models for continuous control

Traditional model-based RL relies on hand-specified or learned models of transition dynamics of the environment. These methods are sample efficient and facilitate learning in the real world but fail to generalize to subtle variations in the underlying dynamics, e.g., due to differences in mass, friction, or actuators across robotic agents or across time. We propose using variational inference to learn an explicit latent representation of unknown environment properties that accelerates learning and facilitates generalization on novel environments at test time. We use Online Bayesian Inference of these learned latents to rapidly adapt online to changes in environments without retaining large replay buffers of recent data. Combined with a neural network ensemble that models dynamics and captures uncertainty over dynamics, our approach demonstrates positive transfer during training and online adaptation on the continuous control task HalfCheetah.

[1]  C. Rasmussen,et al.  Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .

[2]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[3]  Sergey Levine,et al.  Learning to Adapt: Meta-Learning for Model-Based Control , 2018, ArXiv.

[4]  Finale Doshi-Velez,et al.  Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes , 2017, AAAI.

[5]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[6]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7]  David Q. Mayne,et al.  Model predictive control: Recent developments and future promise , 2014, Autom..

[8]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Finale Doshi-Velez,et al.  Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations , 2013, IJCAI.

[11]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[12]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[13]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[14]  Eduardo F. Camacho,et al.  Model Predictive Controllers , 2007 .

[15]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[16]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..