Learning Dynamics Models for Model Predictive Agents

Model-Based Reinforcement Learning involves learning a dynamics model from data, and then using this model to optimise behaviour, most often with an online planner. Much of the recent research along these lines [1, 2, 3] presents a particular set of design choices, involving problem definition, model learning and planning. Given the multiple contributions, it is difficult to evaluate the effects of each. This paper sets out to disambiguate the role of different design choices for learning dynamics models, by comparing their performance to planning with a ground-truth model – the simulator. First, we collect a rich dataset from the training sequence of a model-free agent on 5 domains of the DeepMind Control Suite. Second, we train feed-forward dynamics models in a supervised fashion, and evaluate planner performance while varying and analysing different model design choices, including ensembling, stochasticity, multi-step training and timestep size. Besides the quantitative analysis, we describe a set of qualitative findings, rules of thumb, and future research directions for planning with learned dynamics models. Videos of the results are available at https://sites.google.com/view/learning-better-models.

[1]  Pieter Abbeel,et al.  Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.

[2]  Eduardo D. Sontag,et al.  Neural Networks for Control , 1993 .

[3]  Christopher G. Atkeson,et al.  Estimation of Inertial Parameters of Manipulator Loads and Links , 1986 .

[4]  Martial Hebert,et al.  Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.

[5]  Sergey Levine,et al.  Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[6]  Meire Fortunato,et al.  Learning Mesh-Based Simulation with Graph Networks , 2020, ArXiv.

[7]  Georg Martius,et al.  Sample-efficient Cross-Entropy Method for Real-time Planning , 2020, CoRL.

[8]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[9]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[10]  Raia Hadsell,et al.  Graph networks as learnable physics engines for inference and control , 2018, ICML.

[11]  Mohammad Norouzi,et al.  Mastering Atari with Discrete World Models , 2020, ICLR.

[12]  Jan Peters,et al.  Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning , 2019, ICLR.

[13]  Jan Peters,et al.  A Differentiable Newton Euler Algorithm for Multi-body Model Learning , 2020, ArXiv.

[14]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[15]  G. Martin,et al.  Nonlinear model predictive control , 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251).

[16]  Jan Peters,et al.  Using model knowledge for learning inverse dynamics , 2010, 2010 IEEE International Conference on Robotics and Automation.

[17]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[18]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[19]  Sergey Levine,et al.  Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning , 2019, IEEE Robotics and Automation Letters.

[20]  Pieter Abbeel,et al.  Learning vehicular dynamics, with application to modeling helicopters , 2005, NIPS.

[21]  Nir Levine,et al.  Challenges of real-world reinforcement learning: definitions, benchmarks and analysis , 2021, Machine Learning.

[22]  Jure Leskovec,et al.  Learning to Simulate Complex Physics with Graph Networks , 2020, ICML.

[23]  Manfred Morari,et al.  Model predictive control: Theory and practice - A survey , 1989, Autom..

[24]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[25]  Vikash Kumar,et al.  A Game Theoretic Framework for Model Based Reinforcement Learning , 2020, ICML.

[26]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Jan Peters,et al.  Model Learning with Local Gaussian Process Regression , 2009, Adv. Robotics.

[28]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[29]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[30]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[31]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[32]  Nicolas Schweighofer,et al.  Local Online Support Vector Regression for Learning Control , 2007, 2007 International Symposium on Computational Intelligence in Robotics and Automation.

[33]  Honglak Lee,et al.  Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.

[34]  Roberto Calandra,et al.  Objective Mismatch in Model-based Reinforcement Learning , 2020, L4DC.

[35]  Andrew Gordon Wilson,et al.  On the model-based stochastic value gradient for continuous reinforcement learning , 2020, L4DC.

[36]  Aaron M. Dollar,et al.  Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with Deep Reinforcement Learning , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Stefan Schaal,et al.  Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning , 2002, Applied Intelligence.

[38]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[39]  Oleg O. Sushkov,et al.  A Practical Approach to Insertion with Variable Socket Position Using Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[40]  Rowan McAllister,et al.  Model-Based Meta-Reinforcement Learning for Flight With Suspended Payloads , 2020, IEEE Robotics and Automation Letters.

[41]  Thorsten Joachims,et al.  MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.

[42]  Jan Peters,et al.  Differentiable Physics Models for Real-world Offline Model-based Reinforcement Learning , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Kim D. Listmann,et al.  Deep Lagrangian Networks for end-to-end learning of energy-based control for under-actuated systems , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[45]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[46]  Jan Peters,et al.  Model learning for robot control: a survey , 2011, Cognitive Processing.

[47]  Sergey Levine,et al.  Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.

[48]  Martin A. Riedmiller,et al.  Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models , 2019, CoRL.