Practical Reinforcement Learning For MPC: Learning from sparse objectives in under an hour on a real robot

Model Predictive Control (MPC) is a powerful control technique that handles constraints, takes the system's dynamics into account, and optimizes for a given cost function. In practice, however, it often requires an expert to craft and tune this cost function and find trade-offs between different state penalties to satisfy simple high level objectives. In this paper, we use Reinforcement Learning and in particular value learning to approximate the value function given only high level objectives, which can be sparse and binary. Building upon previous works, we present improvements that allowed us to successfully deploy the method on a real world unmanned ground vehicle. Our experiments show that our method can learn the cost function from scratch and without human intervention, while reaching a performance level similar to that of an expert-tuned MPC. We perform a quantitative comparison of these methods with standard MPC approaches both in simulation and on the real robot.

[1]  Filip Logist,et al.  Symmetric algorithmic differentiation based exact Hessian SQP method and software for Economic MPC , 2014, 53rd IEEE Conference on Decision and Control.

[2]  M. Soroush,et al.  Model Predictive Control Tuning Methods: A Review , 2010 .

[3]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[4]  J. Navarro-Pedreño Numerical Methods for Least Squares Problems , 1996 .

[5]  M. Diehl,et al.  Real-time optimization and nonlinear model predictive control of processes governed by differential-algebraic equations , 2000 .

[6]  Mohamed Medhat Gaber,et al.  Imitation Learning , 2017, ACM Comput. Surv..

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Francesco Borrelli,et al.  Learning Model Predictive Control for Iterative Tasks. A Data-Driven Control Framework , 2016, IEEE Transactions on Automatic Control.

[9]  Raffaello D'Andrea,et al.  A model predictive controller for quadrocopter state interception , 2013, 2013 European Control Conference (ECC).

[10]  Alexander Liniger,et al.  Learning-Based Model Predictive Control for Autonomous Racing , 2019, IEEE Robotics and Automation Letters.

[11]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[12]  Emanuel Todorov,et al.  Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[13]  Angela P. Schoellig,et al.  Learning-based nonlinear model predictive control to improve vision-based mobile robot path-tracking in challenging outdoor environments , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Marco Hutter,et al.  Whole-Body Nonlinear Model Predictive Control Through Contacts for Quadrupeds , 2017, IEEE Robotics and Automation Letters.

[15]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[16]  Mario Zanon,et al.  Data-Driven Economic NMPC Using Reinforcement Learning , 2019, IEEE Transactions on Automatic Control.

[17]  Marco Hutter,et al.  Deep Value Model Predictive Control , 2019, CoRL.

[18]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[19]  H. J. Ferreau,et al.  An online active set strategy to overcome the limitations of explicit MPC , 2008 .

[20]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[21]  Juraj Kabzan,et al.  Cautious Model Predictive Control Using Gaussian Process Regression , 2017, IEEE Transactions on Control Systems Technology.

[22]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[23]  Renaud Dubé,et al.  AMZ Driverless: The full autonomous racing system , 2019, J. Field Robotics.

[24]  Torsten Koller,et al.  Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[25]  Evangelos A. Theodorou,et al.  Model Predictive Path Integral Control: From Theory to Parallel Computation , 2017 .

[26]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[27]  Jay H. Lee,et al.  Model predictive control: Review of the three decades of development , 2011 .

[28]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[29]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[30]  Sham M. Kakade,et al.  Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.