Explore More and Improve Regret in Linear Quadratic Regulators

Stabilizing the unknown dynamics of a control system and minimizing regret in control of an unknown system are among the main goals in control theory and reinforcement learning. In this work, we pursue both these goals for adaptive control of linear quadratic regulators (LQR). Prior works accomplish either one of these goals at the cost of the other one. The algorithms that are guaranteed to find a stabilizing controller suffer from high regret, whereas algorithms that focus on achieving low regret assume the presence of a stabilizing controller at the early stages of agent-environment interaction. In the absence of such a stabilizing controller, at the early stages, the lack of reasonable model estimates needed for (i) strategic exploration and (ii) design of controllers that stabilize the system, results in regret that scales exponentially in the problem dimensions. We propose a framework for adaptive control that exploits the characteristics of linear dynamical systems and deploys additional exploration in the early stages of agent-environment interaction to guarantee sooner design of stabilizing controllers. We show that for the classes of controllable and stabilizable LQRs, where the latter is a generalization of prior work, these methods achieve O(√T) regret with a polynomial dependence in the problem dimensions.

[1]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[2]  T. Lai,et al.  Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .

[3]  Nikolai Matni,et al.  Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[4]  Max Simchowitz,et al.  Improper Learning for Non-Stochastic Control , 2020, COLT.

[5]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[6]  P. Kumar,et al.  Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .

[7]  Paul Zarchan,et al.  Fundamentals of Kalman Filtering: A Practical Approach , 2001 .

[8]  Alessandro Lazaric,et al.  Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.

[9]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[10]  T. Lai,et al.  Asymptotically efficient self-tuning regulators , 1987 .

[11]  Sham M. Kakade,et al.  The Nonstochastic Control Problem , 2020, ALT.

[12]  Babak Hassibi,et al.  Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems , 2020, NeurIPS.

[13]  Max Simchowitz,et al.  Naive Exploration is Optimal for Online LQR , 2020, ICML.

[14]  Babak Hassibi,et al.  Regret Minimization in Partially Observable Linear Quadratic Control , 2020, ArXiv.

[15]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[16]  Claude-Nicolas Fiechter,et al.  PAC adaptive control of linear systems , 1997, COLT '97.

[17]  Alon Cohen,et al.  Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently , 2020, ICML.

[18]  Kamyar Azizzadenesheli,et al.  Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems , 2020, ArXiv.

[19]  Kamyar Azizzadenesheli,et al.  Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.

[20]  Kamyar Azizzadenesheli,et al.  Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting , 2020, 2021 American Control Conference (ACC).

[21]  Ambuj Tewari,et al.  Input Perturbations for Adaptive Regulation and Learning , 2018, ArXiv.

[22]  Vladimír Kucera,et al.  The discrete Riccati equation of optimal control , 1972, Kybernetika.

[23]  Ambuj Tewari,et al.  Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.

[24]  Benjamin Recht,et al.  Certainty Equivalent Control of LQR is Efficient , 2019, ArXiv.

[25]  Benjamin Recht,et al.  Certainty Equivalence is Efficient for Linear Quadratic Control , 2019, NeurIPS.

[26]  J. W. Nieuwenhuis,et al.  Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[27]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[28]  Avinatan Hassidim,et al.  Online Linear Quadratic Control , 2018, ICML.

[29]  Yi Ouyang,et al.  Learning-based Control of Unknown Linear Systems with Thompson Sampling , 2017, ArXiv.

[30]  T. Lai,et al.  Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[31]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .