Logarithmic Regret for Episodic Continuous-Time Linear-Quadratic Reinforcement Learning Over a Finite-Time Horizon
暂无分享,去创建一个
Xin Guo | Anran Hu | Matteo Basei | Yufei Zhang | Yufei Zhang | Xin Guo | Anran Hu | Matteo Basei
[1] Alon Cohen,et al. Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently , 2020, ICML.
[2] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[3] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ICML.
[4] Rémi Munos,et al. A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions , 2000, Machine Learning.
[5] Max Simchowitz,et al. Naive Exploration is Optimal for Online LQR , 2020, ICML.
[6] Benjamin Recht,et al. Certainty Equivalent Control of LQR is Efficient , 2019, ArXiv.
[7] Jessica Fuerst,et al. Stochastic Differential Equations And Applications , 2016 .
[8] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[9] Zongli Lin,et al. Output Feedback Reinforcement Learning Control for the Continuous-Time Linear Quadratic Regulator Problem , 2018, 2018 Annual American Control Conference (ACC).
[10] Yann Ollivier,et al. Making Deep Q-learning methods robust to time discretization , 2019, ICML.
[11] Rémi Munos,et al. Reinforcement Learning for Continuous Stochastic Control Problems , 1997, NIPS.
[12] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[13] Mark Veraar,et al. The stochastic Fubini theorem revisited , 2012 .
[14] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[15] H. Soner,et al. Small time path behavior of double stochastic integrals and applications to stochastic control , 2005, math/0602453.
[16] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[17] Robert R. Bitmead,et al. Riccati Difference and Differential Equations: Convergence, Monotonicity and Stability , 1991 .
[18] Frank L. Lewis,et al. Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.
[19] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.
[20] B. Hambly,et al. Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon , 2020, ArXiv.
[21] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[22] X. Zhou,et al. Stochastic Controls: Hamiltonian Systems and HJB Equations , 1999 .
[23] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..
[24] Max Simchowitz,et al. Logarithmic Regret for Adversarial Online Control , 2020, ICML.
[25] Claude-Nicolas Fiechter,et al. PAC adaptive control of linear systems , 1997, COLT '97.
[26] Lei Guo,et al. Adaptive continuous-time linear quadratic Gaussian control , 1999, IEEE Trans. Autom. Control..
[27] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[28] Petr Mandl,et al. On least squares estimation in continuous time linear stochastic systems , 1992, Kybernetika.
[29] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.