论文信息 - Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework - 字舞流文

Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

This paper studies a continuous-time linear quadratic reinforcement learning problem in an episodic setting. We first show that naive discretization and piecewise approximation with discrete-time RL algorithms yields a linear regret with respect to the number of learning episodes $N$. We then propose an algorithm with continuous-time controls based on a regularized least-squares estimation. We establish a sublinear regret bound in the order of $\tilde O(N^{9/10})$. The analysis consists of two parts: parameter estimation error, which relies on properties of sub-exponential random variables and double stochastic integrals; and perturbation analysis, which establishes the robustness of the associated continuous-time Riccati equation by exploiting its regularity property. The regret bound for the one-dimensional case improves to $\tilde O(\sqrt{N})$.

Xin Guo | Anran Hu | Matteo Basei | Xin Guo | Anran Hu | Matteo Basei

[1] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.

[2] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[3] V. N. Bogaevski,et al. Matrix Perturbation Theory , 1991 .

[4] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[5] Yann Ollivier,et al. Making Deep Q-learning methods robust to time discretization , 2019, ICML.

[6] Rémi Munos,et al. Reinforcement Learning for Continuous Stochastic Control Problems , 1997, NIPS.

[7] Benjamin Recht,et al. Certainty Equivalence is Efficient for Linear Quadratic Control , 2019, NeurIPS.

[8] Lei Guo,et al. Adaptive continuous-time linear quadratic Gaussian control , 1999, IEEE Trans. Autom. Control..

[9] X. Zhou,et al. Stochastic Controls: Hamiltonian Systems and HJB Equations , 1999 .

[10] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.

[11] Petr Mandl,et al. On least squares estimation in continuous time linear stochastic systems , 1992, Kybernetika.

[12] G. M.,et al. Partial Differential Equations I , 2023, Applied Mathematical Sciences.

[13] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ArXiv.

[14] Zongli Lin,et al. Output Feedback Reinforcement Learning Control for the Continuous-Time Linear Quadratic Regulator Problem , 2018, 2018 Annual American Control Conference (ACC).

[15] Rémi Munos,et al. A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions , 2000, Machine Learning.

[16] Zhengtao Ding. Adaptive control of linear systems , 2013 .

[17] W. Fleming,et al. Controlled Markov processes and viscosity solutions , 1992 .

[18] Mark Veraar,et al. The stochastic Fubini theorem revisited , 2012 .

[19] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.

[20] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .

[21] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ICML.

[22] H. Soner,et al. Small time path behavior of double stochastic integrals and applications to stochastic control , 2005, math/0602453.

[23] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[24] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..

[25] Claude-Nicolas Fiechter,et al. PAC adaptive control of linear systems , 1997, COLT '97.

[26] Frank L. Lewis,et al. Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[27] Benjamin Recht,et al. Certainty Equivalent Control of LQR is Efficient , 2019, ArXiv.

[28] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.