Online Learning for Receding Horizon Control with Provable Regret Guarantees

We address the problem of learning to control an unknown linear dynamical system with time varying cost functions through the framework of online Receding Horizon Control (RHC). We consider the setting where the control algorithm does not know the true system model and has only access to a fixedlength (that does not grow with the control horizon) preview of the future cost functions. We characterize the performance of an algorithm using the metric of dynamic regret, which is defined as the difference between the cumulative cost incurred by the algorithm and that of the best sequence of actions in hindsight. We propose two different online RHC algorithms to address this problem, namely Certainty Equivalence RHC (CE-RHC) algorithm and Optimistic RHC (O-RHC) algorithm. We show that under the standard stability assumption for the model estimate, the CE-RHC algorithm achieves O(T ) dynamic regret. We then extend this result to the setting where the stability assumption hold only for the true system model by proposing the O-RHC algorithm. We show that O-RHC algorithm achieves O(T ) dynamic regret but with some additional computation.

[1]  Lars Grüne,et al.  Asymptotic stability and transient optimality of economic MPC without terminal conditions , 2014 .

[2]  Sham M. Kakade,et al.  Online Control with Adversarial Disturbances , 2019, ICML.

[3]  Francesco Borrelli,et al.  Learning Model Predictive Control for Iterative Tasks. A Data-Driven Control Framework , 2016, IEEE Transactions on Automatic Control.

[4]  S. Sastry,et al.  Adaptive Control: Stability, Convergence and Robustness , 1989 .

[5]  Marco Rivera,et al.  Model Predictive Control for Power Converters and Drives: Advances and Trends , 2017, IEEE Transactions on Industrial Electronics.

[6]  Craig Boutilier,et al.  Data center cooling using model-predictive control , 2018, NeurIPS.

[7]  Karan Singh,et al.  Logarithmic Regret for Online Control , 2019, NeurIPS.

[8]  Eduardo F. Camacho,et al.  Robust tube-based MPC for tracking of constrained linear systems with additive disturbances , 2010 .

[9]  Nikolai Matni,et al.  Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[10]  Ian Postlethwaite,et al.  Multivariable Feedback Control: Analysis and Design , 1996 .

[11]  Mario Sznaier,et al.  Randomized Algorithms for Analysis and Control of Uncertain Systems with Applications, Second Edition, Roberto Tempo, Giuseppe Calafiore, Fabrizio Dabbene (Eds.). Springer-Verlag, London (2013), 357, ISBN: 978-1-4471-4609-4 , 2014, Autom..

[12]  Lars Grüne,et al.  Economic model predictive control for time‐varying system: Performance and stability results , 2019, Optimal Control Applications and Methods.

[13]  John B. Moore,et al.  Persistence of excitation in extended least squares , 1983 .

[14]  Avinatan Hassidim,et al.  Online Linear Quadratic Control , 2018, ICML.

[15]  Eric C. Kerrigan,et al.  Optimization over state feedback policies for robust control with constraints , 2006, Autom..

[16]  Toshiharu Sugie,et al.  Adaptive model predictive control for a class of constrained linear systems based on the comparison model , 2007, Autom..

[17]  Antonio Ferramosca,et al.  Economic MPC for a changing economic criterion , 2010, 49th IEEE Conference on Decision and Control (CDC).

[18]  Jianjun Yuan,et al.  Trading-Off Static and Dynamic Regret in Online Least-Squares and Beyond , 2020, AAAI.

[19]  Vijay Kumar,et al.  Model Predictive Trajectory Tracking and Collision Avoidance for Reliable Outdoor Deployment of Unmanned Aerial Vehicles , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Marko Bacic,et al.  Model predictive control , 2003 .

[21]  Alessandro Casavola,et al.  Theoretical advances on Economic Model Predictive Control with time-varying costs , 2016, Annu. Rev. Control..

[22]  Andrew R. Teel,et al.  Model predictive control: for want of a local control Lyapunov function, all is not lost , 2005, IEEE Transactions on Automatic Control.

[23]  Adam Wierman,et al.  An Online Algorithm for Smoothed Regression and LQR Control , 2018, AISTATS.

[24]  John B. Moore,et al.  Persistence of Excitation in Linear Systems , 1985, 1985 American Control Conference.

[25]  Na Li,et al.  Online Optimal Control with Linear Dynamics and Predictions: Algorithms and Regret Analysis , 2019, NeurIPS.

[26]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[27]  Lorenzo Fagiano,et al.  Adaptive model predictive control for linear time varying MIMO systems , 2019, Autom..

[28]  Monimoy Bujarbaruah,et al.  Adaptive MPC under Time Varying Uncertainty: Robust and Stochastic , 2019, ArXiv.

[29]  Antonio Ferramosca,et al.  Economic MPC for a Changing Economic Criterion for Linear Systems , 2014, IEEE Transactions on Automatic Control.

[30]  Yishay Mansour,et al.  Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ArXiv.

[31]  David Q. Mayne,et al.  Robust model predictive control using tubes , 2004, Autom..

[32]  David Q. Mayne,et al.  Constrained model predictive control: Stability and optimality , 2000, Autom..

[33]  W. P. M. H. Heemels,et al.  On input-to-state stability of min-max nonlinear model predictive control , 2008, Syst. Control. Lett..

[34]  Soon-Jo Chung,et al.  Neural Lander: Stable Drone Landing Control Using Learned Dynamics , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[35]  David Angeli,et al.  On Average Performance and Stability of Economic Model Predictive Control , 2012, IEEE Transactions on Automatic Control.

[36]  Lars Grüne,et al.  Closed-loop performance analysis for economic model predictive control of time-varying systems , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[37]  Martin Guay,et al.  Adaptive Model Predictive Control for Constrained Nonlinear Systems , 2008 .

[38]  Jay H. Lee,et al.  Model predictive control: past, present and future , 1999 .

[39]  Samet Oymak,et al.  Non-asymptotic Identification of LTI Systems from a Single Trajectory , 2018, 2019 American Control Conference (ACC).

[40]  Benjamin Recht,et al.  Certainty Equivalence is Efficient for Linear Quadratic Control , 2019, NeurIPS.

[41]  Yisong Yue,et al.  The Power of Predictions in Online Control , 2020, NeurIPS.

[42]  Alberto Bemporad,et al.  Predictive Control for Linear and Hybrid Systems , 2017 .

[43]  Max Simchowitz,et al.  Improper Learning for Non-Stochastic Control , 2020, COLT.

[44]  Varun Kanade,et al.  Tracking Adversarial Targets , 2014, ICML.

[45]  B. Anderson,et al.  Optimal control: linear quadratic methods , 1990 .

[46]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[47]  Lars Grüne,et al.  On non-averaged performance of economic MPC with terminal conditions , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).