论文信息 - An Online Learning Approach to Model Predictive Control

An Online Learning Approach to Model Predictive Control

Model predictive control (MPC) is a powerful technique for solving dynamic control tasks. In this paper, we show that there exists a close connection between MPC and online learning, an abstract theoretical framework for analyzing online decision making in the optimization literature. This new perspective provides a foundation for leveraging powerful online learning algorithms to design MPC algorithms. Specifically, we propose a new algorithm based on dynamic mirror descent (DMD), an online learning algorithm that is designed for non-stationary setups. Our algorithm, Dynamic Mirror Descent Model Predictive Control (DMD-MPC), represents a general family of MPC algorithms that includes many existing techniques as special instances. DMD-MPC also provides a fresh perspective on previous heuristics used in MPC and suggests a principled way to design new MPC algorithms. In the experimental section of this paper, we demonstrate the flexibility of DMD-MPC, presenting a set of new MPC algorithms on a simple simulated cartpole and a simulated and real-world aggressive driving task. Videos of the real-world experiments can be found at this https URL and this https URL.

[1] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[2] Marko Bacic,et al. Model predictive control , 2003 .

[3] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[4] Magnus Rattray,et al. Natural gradient descent for on-line learning , 1998 .

[5] Petros Koumoutsakos,et al. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[6] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[7] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[8] Isao Ono,et al. Theoretical Foundation for CMA-ES from Information Geometry Perspective , 2012, Algorithmica.

[9] James M. Rehg,et al. Aggressive driving with model predictive path integral control , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[10] Hilbert J. Kappen,et al. Risk Sensitive Path Integral Control , 2010, UAI.

[11] Marin Kobilarov,et al. Cross-Entropy Randomized Motion Planning , 2011, Robotics: Science and Systems.

[12] David Q. Mayne,et al. Model predictive control: Recent developments and future promise , 2014, Autom..

[13] James M. Rehg,et al. AutoRally: An Open Platform for Aggressive Autonomous Driving , 2018, IEEE Control Systems.

[14] James M. Rehg,et al. Vision-Based High-Speed Driving With a Deep Dynamic Observer , 2018, IEEE Robotics and Automation Letters.

[15] Dirk P. Kroese,et al. Chapter 3 – The Cross-Entropy Method for Optimization , 2013 .

[16] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[17] Dirk P. Kroese,et al. The cross-entropy method for estimation , 2013 .

[18] Carlos Bordons Alba,et al. Model Predictive Control , 2012 .

[19] Rebecca Willett,et al. Dynamical Models and tracking regret in online convex programming , 2013, ICML.

[20] Tadahiro Taniguchi,et al. Acceleration of Gradient-Based Path Integral Method for Efficient Optimal and Inverse Optimal Control , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21] Tyrone E. Duncan,et al. Linear-Exponential-Quadratic Gaussian Control , 2013, IEEE Transactions on Automatic Control.

[22] Yuval Tassa,et al. Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[23] James M. Rehg,et al. Robust Sampling Based Model Predictive Control with Sparse Objective Information , 2018, Robotics: Science and Systems.

[24] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[25] Lijun Zhang,et al. Adaptive Online Learning in Dynamic Environments , 2018, NeurIPS.

[26] R. Bellman,et al. DIFFERENTIAL QUADRATURE AND LONG-TERM INTEGRATION , 1971 .

[27] Frank Nielsen,et al. Statistical exponential families: A digest with flash cards , 2009, ArXiv.

[28] Bjarne E. Helvik,et al. Using the Cross-Entropy Method to Guide/Govern Mobile Agent's Path Finding in Networks , 2001, MATA.

[29] Toshiyuki Kondo,et al. Mirror Descent Search and Acceleration , 2017, Robotics Auton. Syst..

[30] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[31] James M. Rehg,et al. Information-Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving , 2017, IEEE Transactions on Robotics.

[32] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[33] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[34] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[35] Pieter Abbeel,et al. Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[36] Nolan Wagener,et al. Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).