Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret

We present the first computationally-efficient algorithm with $\widetilde O(\sqrt{T})$ regret for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve an open question of Abbasi-Yadkori and Szepesv\'ari (2011) and Dean, Mania, Matni, Recht, and Tu (2018).

[1]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[2]  F. T. Wright,et al.  A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables , 1971 .

[3]  F. T. Wright A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables Whose Distributions are not Necessarily Symmetric , 1973 .

[4]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[5]  Aurea Martínez,et al.  A state constrained optimal control problem related to the sterilization of canned foods , 1994, Autom..

[6]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[7]  Peter Whittle,et al.  Optimal Control: Basics and Beyond , 1996 .

[8]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[9]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[10]  Hans P. Geering,et al.  Optimal control with engineering applications , 2007 .

[11]  John T. Workman,et al.  Optimal Control Applied to Biological Models , 2007 .

[12]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[13]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[14]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[15]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[16]  Adel Javanmard,et al.  Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems , 2012, NIPS.

[17]  D. Hearn,et al.  OPTIMAL CONTROL MODELS IN FINANCE , 2013 .

[18]  Ambuj Tewari,et al.  Finite Time Analysis of Optimal Adaptive Policies for Linear-Quadratic Systems , 2017, ArXiv.

[19]  Yi Ouyang,et al.  Learning-based Control of Unknown Linear Systems with Thompson Sampling , 2017, ArXiv.

[20]  Alessandro Lazaric,et al.  Thompson Sampling for Linear-Quadratic Control Problems , 2017, AISTATS.

[21]  Sanjeev Arora,et al.  Towards Provable Control for Unknown Linear Dynamical Systems , 2018, International Conference on Learning Representations.

[22]  Alessandro Lazaric,et al.  Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.

[23]  Michael I. Jordan,et al.  Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification , 2018, COLT.

[24]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[25]  Nikolai Matni,et al.  Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[26]  Avinatan Hassidim,et al.  Online Linear Quadratic Control , 2018, ICML.

[27]  Nevena Lazic,et al.  Regret Bounds for Model-Free Linear Quadratic Control , 2018, ArXiv.

[28]  Nevena Lazic,et al.  Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.

[29]  Martin J. Wainwright,et al.  Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems , 2018, AISTATS.

[30]  Nikolai Matni,et al.  On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.

[31]  Ambuj Tewari,et al.  Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.