论文信息 - Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret - 字舞流文

Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret

We present the first computationally-efficient algorithm with $\widetilde O(\sqrt{T})$ regret for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve an open question of Abbasi-Yadkori and Szepesv\'ari (2011) and Dean, Mania, Matni, Recht, and Tu (2018).

Yishay Mansour | Alon Cohen | Tomer Koren | Y. Mansour | Tomer Koren | Alon Cohen

[1] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[2] F. T. Wright,et al. A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables , 1971 .

[3] F. T. Wright. A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables Whose Distributions are not Necessarily Symmetric , 1973 .

[4] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .

[5] Aurea Martínez,et al. A state constrained optimal control problem related to the sterilization of canned foods , 1994, Autom..

[6] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[7] Peter Whittle,et al. Optimal Control: Basics and Beyond , 1996 .

[8] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[9] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[10] Hans P. Geering,et al. Optimal control with engineering applications , 2007 .

[11] John T. Workman,et al. Optimal Control Applied to Biological Models , 2007 .

[12] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[13] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[14] Sham M. Kakade,et al. A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[15] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[16] Adel Javanmard,et al. Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems , 2012, NIPS.

[17] D. Hearn,et al. OPTIMAL CONTROL MODELS IN FINANCE , 2013 .

[18] Ambuj Tewari,et al. Finite Time Analysis of Optimal Adaptive Policies for Linear-Quadratic Systems , 2017, ArXiv.

[19] Yi Ouyang,et al. Learning-based Control of Unknown Linear Systems with Thompson Sampling , 2017, ArXiv.

[20] Alessandro Lazaric,et al. Thompson Sampling for Linear-Quadratic Control Problems , 2017, AISTATS.

[21] Sanjeev Arora,et al. Towards Provable Control for Unknown Linear Dynamical Systems , 2018, International Conference on Learning Representations.

[22] Alessandro Lazaric,et al. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.

[23] Michael I. Jordan,et al. Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification , 2018, COLT.

[24] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[25] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[26] Avinatan Hassidim,et al. Online Linear Quadratic Control , 2018, ICML.

[27] Nevena Lazic,et al. Regret Bounds for Model-Free Linear Quadratic Control , 2018, ArXiv.

[28] Nevena Lazic,et al. Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.

[29] Martin J. Wainwright,et al. Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems , 2018, AISTATS.

[30] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.

[31] Ambuj Tewari,et al. Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.