论文信息 - Logarithmic Regret for Episodic Continuous-Time Linear-Quadratic Reinforcement Learning Over a Finite-Time Horizon - 字舞流文

Logarithmic Regret for Episodic Continuous-Time Linear-Quadratic Reinforcement Learning Over a Finite-Time Horizon

We study finite-time horizon continuous-time linear-quadratic reinforcement learning problems in an episodic setting, where both the state and control coefficients are unknown to the controller. We first propose a least-squares algorithm based on continuous-time observations and controls, and establish a logarithmic regret bound of order $O((\ln M)(\ln\ln M))$, with $M$ being the number of learning episodes. The analysis consists of two parts: perturbation analysis, which exploits the regularity and robustness of the associated Riccati differential equation; and parameter estimation error, which relies on sub-exponential properties of continuous-time least-squares estimators. We further propose a practically implementable least-squares algorithm based on discrete-time observations and piecewise constant controls, which achieves similar logarithmic regret with an additional term depending explicitly on the time stepsizes used in the algorithm.

Xin Guo | Anran Hu | Matteo Basei | Yufei Zhang | Yufei Zhang | Xin Guo | Anran Hu | Matteo Basei

[1] Alon Cohen,et al. Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently , 2020, ICML.

[2] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .

[3] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ICML.

[4] Rémi Munos,et al. A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions , 2000, Machine Learning.

[5] Max Simchowitz,et al. Naive Exploration is Optimal for Online LQR , 2020, ICML.

[6] Benjamin Recht,et al. Certainty Equivalent Control of LQR is Efficient , 2019, ArXiv.

[7] Jessica Fuerst,et al. Stochastic Differential Equations And Applications , 2016 .

[8] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[9] Zongli Lin,et al. Output Feedback Reinforcement Learning Control for the Continuous-Time Linear Quadratic Regulator Problem , 2018, 2018 Annual American Control Conference (ACC).

[10] Yann Ollivier,et al. Making Deep Q-learning methods robust to time discretization , 2019, ICML.

[11] Rémi Munos,et al. Reinforcement Learning for Continuous Stochastic Control Problems , 1997, NIPS.

[12] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.

[13] Mark Veraar,et al. The stochastic Fubini theorem revisited , 2012 .

[14] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.

[15] H. Soner,et al. Small time path behavior of double stochastic integrals and applications to stochastic control , 2005, math/0602453.

[16] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[17] Robert R. Bitmead,et al. Riccati Difference and Differential Equations: Convergence, Monotonicity and Stability , 1991 .

[18] Frank L. Lewis,et al. Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[19] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.

[20] B. Hambly,et al. Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon , 2020, ArXiv.

[21] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[22] X. Zhou,et al. Stochastic Controls: Hamiltonian Systems and HJB Equations , 1999 .

[23] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..

[24] Max Simchowitz,et al. Logarithmic Regret for Adversarial Online Control , 2020, ICML.

[25] Claude-Nicolas Fiechter,et al. PAC adaptive control of linear systems , 1997, COLT '97.

[26] Lei Guo,et al. Adaptive continuous-time linear quadratic Gaussian control , 1999, IEEE Trans. Autom. Control..

[27] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.

[28] Petr Mandl,et al. On least squares estimation in continuous time linear stochastic systems , 1992, Kybernetika.

[29] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.