A Regret Minimization Approach to Iterative Learning Control

We consider the setting of iterative learning control, or model-based policy learning in the presence of uncertain, time-varying dynamics. In this setting, we propose a new performance metric, planning regret, which replaces the standard stochastic uncertainty assumptions with worst case regret. Based on recent advances in nonstochastic control, we design a new iterative algorithm for minimizing planning regret that is more robust to model mismatch and uncertainty. We provide theoretical and empirical evidence that the proposed algorithm outperforms existing methods on several benchmarks.

[1]  David Q. Mayne,et al.  Model predictive control: Recent developments and future promise , 2014, Autom..

[2]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[3]  Sham M. Kakade,et al.  The Nonstochastic Control Problem , 2020, ALT.

[4]  Shie Mannor,et al.  Online Learning for Adversaries with Memory: Price of Past Mistakes , 2015, NIPS.

[5]  Francesco Borrelli,et al.  Learning Model Predictive Control for Iterative Tasks. A Data-Driven Control Framework , 2016, IEEE Transactions on Automatic Control.

[6]  Benjamin Recht,et al.  Certainty Equivalent Control of LQR is Efficient , 2019, ArXiv.

[7]  Lukas Hewing,et al.  Learning-Based Model Predictive Control: Toward Safe Learning in Control , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[8]  Sham M. Kakade,et al.  Online Control with Adversarial Disturbances , 2019, ICML.

[9]  David H. Owens,et al.  Iterative learning control - An optimization paradigm , 2015, Annu. Rev. Control..

[10]  Max Simchowitz,et al.  Improper Learning for Non-Stochastic Control , 2020, COLT.

[11]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[12]  Kevin L. Moore,et al.  Iterative Learning Control: Brief Survey and Categorization , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[14]  D. de Roover,et al.  Synthesis of a robust iterative learning controller using an H/sub /spl infin// approach , 1996 .

[15]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[16]  Avinatan Hassidim,et al.  Online Linear Quadratic Control , 2018, ICML.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Michael J. Grimble,et al.  Iterative Learning Control for Deterministic Systems , 1992 .

[19]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[20]  Mi-Ching Tsai,et al.  Robust and Optimal Control , 2014 .

[21]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[22]  David Q. Mayne,et al.  Robust model predictive control of constrained linear systems with bounded disturbances , 2005, Autom..

[23]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[24]  J. Doyle,et al.  Essentials of Robust Control , 1997 .

[25]  David Q. Mayne,et al.  Robust model predictive control using tubes , 2004, Autom..

[26]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[27]  Nolan Wagener,et al.  An Online Learning Approach to Model Predictive Control , 2019, Robotics: Science and Systems.

[28]  M. L. Chambers The Mathematical Theory of Optimal Processes , 1965 .

[29]  Naman Agarwal,et al.  Deluca - A Differentiable Control Library: Environments, Methods, and Benchmarking , 2021, ArXiv.

[30]  Nikolai Matni,et al.  Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[31]  Karan Singh,et al.  Logarithmic Regret for Online Control , 2019, NeurIPS.

[32]  Maria-Florina Balcan,et al.  Provable Guarantees for Gradient-Based Meta-Learning , 2019, ICML.

[33]  Alberto Bemporad,et al.  Robust model predictive control: A survey , 1998, Robustness in Identification and Control.

[34]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[35]  Max Simchowitz,et al.  Making Non-Stochastic Control (Almost) as Easy as Stochastic , 2020, NeurIPS.

[36]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.