An Online Learning Approach to Model Predictive Control

Model predictive control (MPC) is a powerful technique for solving dynamic control tasks. In this paper, we show that there exists a close connection between MPC and online learning, an abstract theoretical framework for analyzing online decision making in the optimization literature. This new perspective provides a foundation for leveraging powerful online learning algorithms to design MPC algorithms. Specifically, we propose a new algorithm based on dynamic mirror descent (DMD), an online learning algorithm that is designed for non-stationary setups. Our algorithm, Dynamic Mirror Descent Model Predictive Control (DMD-MPC), represents a general family of MPC algorithms that includes many existing techniques as special instances. DMD-MPC also provides a fresh perspective on previous heuristics used in MPC and suggests a principled way to design new MPC algorithms. In the experimental section of this paper, we demonstrate the flexibility of DMD-MPC, presenting a set of new MPC algorithms on a simple simulated cartpole and a simulated and real-world aggressive driving task. Videos of the real-world experiments can be found at this https URL and this https URL.

[1]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[2]  Marko Bacic,et al.  Model predictive control , 2003 .

[3]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[4]  Magnus Rattray,et al.  Natural gradient descent for on-line learning , 1998 .

[5]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[6]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[7]  Shie Mannor,et al.  Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[8]  Isao Ono,et al.  Theoretical Foundation for CMA-ES from Information Geometry Perspective , 2012, Algorithmica.

[9]  James M. Rehg,et al.  Aggressive driving with model predictive path integral control , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Hilbert J. Kappen,et al.  Risk Sensitive Path Integral Control , 2010, UAI.

[11]  Marin Kobilarov,et al.  Cross-Entropy Randomized Motion Planning , 2011, Robotics: Science and Systems.

[12]  David Q. Mayne,et al.  Model predictive control: Recent developments and future promise , 2014, Autom..

[13]  James M. Rehg,et al.  AutoRally: An Open Platform for Aggressive Autonomous Driving , 2018, IEEE Control Systems.

[14]  James M. Rehg,et al.  Vision-Based High-Speed Driving With a Deep Dynamic Observer , 2018, IEEE Robotics and Automation Letters.

[15]  Dirk P. Kroese,et al.  Chapter 3 – The Cross-Entropy Method for Optimization , 2013 .

[16]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[17]  Dirk P. Kroese,et al.  The cross-entropy method for estimation , 2013 .

[18]  Carlos Bordons Alba,et al.  Model Predictive Control , 2012 .

[19]  Rebecca Willett,et al.  Dynamical Models and tracking regret in online convex programming , 2013, ICML.

[20]  Tadahiro Taniguchi,et al.  Acceleration of Gradient-Based Path Integral Method for Efficient Optimal and Inverse Optimal Control , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Tyrone E. Duncan,et al.  Linear-Exponential-Quadratic Gaussian Control , 2013, IEEE Transactions on Automatic Control.

[22]  Yuval Tassa,et al.  Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[23]  James M. Rehg,et al.  Robust Sampling Based Model Predictive Control with Sparse Objective Information , 2018, Robotics: Science and Systems.

[24]  András Lörincz,et al.  Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[25]  Lijun Zhang,et al.  Adaptive Online Learning in Dynamic Environments , 2018, NeurIPS.

[26]  R. Bellman,et al.  DIFFERENTIAL QUADRATURE AND LONG-TERM INTEGRATION , 1971 .

[27]  Frank Nielsen,et al.  Statistical exponential families: A digest with flash cards , 2009, ArXiv.

[28]  Bjarne E. Helvik,et al.  Using the Cross-Entropy Method to Guide/Govern Mobile Agent's Path Finding in Networks , 2001, MATA.

[29]  Toshiyuki Kondo,et al.  Mirror Descent Search and Acceleration , 2017, Robotics Auton. Syst..

[30]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[31]  James M. Rehg,et al.  Information-Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving , 2017, IEEE Transactions on Robotics.

[32]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[33]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[34]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[35]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[36]  Nolan Wagener,et al.  Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).