Differentiable MPC for End-to-end Planning and Control

We present foundations for using Model Predictive Control (MPC) as a differentiable policy class for reinforcement learning in continuous state and action spaces. This provides one way of leveraging and combining the advantages of model-free and model-based approaches. Specifically, we differentiate through MPC by using the KKT conditions of the convex approximation at a fixed point of the controller. Using this strategy, we are able to learn the cost and dynamics of a controller via end-to-end learning. Our experiments focus on imitation learning in the pendulum and cartpole domains, where we learn the cost and dynamics terms of an MPC policy class. We show that our MPC policies are significantly more data-efficient than a generic neural network and that our method is superior to traditional system identification in a setting where the expert is unrealizable.

[1]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[2]  David Hsu,et al.  QMDP-Net: Deep Learning for Planning under Partial Observability , 2017, NIPS.

[3]  Claire J. Tomlin,et al.  Learning-based model predictive control on a quadrotor: Onboard implementation and experimental results , 2012, 2012 IEEE International Conference on Robotics and Automation.

[4]  Sergey Levine,et al.  Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[5]  Sergey Levine,et al.  Learning from the hindsight plan — Episodic MPC improvement , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Francisco Rodríguez,et al.  Robust tube-based predictive control for mobile robots in off-road conditions , 2011, Robotics Auton. Syst..

[7]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[8]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Jeff G. Schneider,et al.  Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning , 1996, NIPS.

[10]  Evangelos A. Theodorou,et al.  Model Predictive Path Integral Control: From Theory to Parallel Computation , 2017 .

[11]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[12]  Gaurav S. Sukhatme,et al.  Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[13]  Shimon Whiteson,et al.  TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning , 2017, ICLR 2018.

[14]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[15]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[16]  James M. Rehg,et al.  Aggressive driving with model predictive path integral control , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Martin A. Riedmiller,et al.  Approximate real-time optimal control based on sparse Gaussian process models , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[18]  Roland Siegwart,et al.  Fast nonlinear model predictive control for multicopter attitude tracking on SO(3) , 2015, 2015 IEEE Conference on Control Applications (CCA).

[19]  Razvan Pascanu,et al.  Learning model-based planning from scratch , 2017, ArXiv.

[20]  Wei Zhan,et al.  A Fast Integrated Planning and Control Framework for Autonomous Driving via Imitation Learning , 2017, Volume 3: Modeling and Validation; Multi-Agent and Networked Systems; Path Planning and Motion Control; Tracking Control Systems; Unmanned Aerial Vehicles (UAVs) and Application; Unmanned Ground and Aerial Vehicles; Vibration in Mechanical Systems; Vibrat.

[21]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[22]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Martial Hebert,et al.  Improved Learning of Dynamics Models for Control , 2016, ISER.

[24]  Roland Siegwart,et al.  Fast nonlinear Model Predictive Control for unified trajectory optimization and tracking , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[26]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[27]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[28]  C. Karen Liu,et al.  Differential dynamic programming with nonlinear constraints , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[30]  Sergey Levine,et al.  MBMF: Model-Based Priors for Model-Free Reinforcement Learning , 2017, ArXiv.

[31]  Yuval Tassa,et al.  Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Shimon Whiteson,et al.  TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning , 2017, ICLR.

[34]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[35]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[36]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[37]  Ross A. Knepper,et al.  DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[38]  Evangelos Theodorou,et al.  MPC-Inspired Neural Network Policies for Sequential Decision Making , 2018, ArXiv.

[39]  Satinder Singh,et al.  Value Prediction Network , 2017, NIPS.

[40]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[41]  Manfred Morari,et al.  Optimization‐based autonomous racing of 1:43 scale RC cars , 2015, ArXiv.

[42]  Anthony Tzes,et al.  Model predictive quadrotor indoor position control , 2011, 2011 19th Mediterranean Conference on Control & Automation (MED).

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Luca Rigazio,et al.  Path Integral Networks: End-to-End Differentiable Optimal Control , 2017, ArXiv.

[45]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[46]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[47]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[48]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[49]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[50]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[51]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[52]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[53]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[54]  Allan Jabri,et al.  Universal Planning Networks , 2018, ICML.