Posterior Sampling-Based Reinforcement Learning for Control of Unknown Linear Systems
暂无分享,去创建一个
[1] P. Kumar,et al. Adaptive control with the stochastic approximation algorithm: Geometry and convergence , 1985 .
[2] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[3] Adel Javanmard,et al. Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems , 2012, NIPS.
[4] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[5] Csaba Szepesvári,et al. Bayesian Optimal Control of Smoothly Parameterized Systems , 2015, UAI.
[6] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[7] Nevena Lazic,et al. Regret Bounds for Model-Free Linear Quadratic Control , 2018, ArXiv.
[8] Michael Jong Kim,et al. Thompson Sampling for Stochastic Control: The Finite Parameter Case , 2017, IEEE Transactions on Automatic Control.
[9] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[10] Maria Adler,et al. Stable Adaptive Systems , 2016 .
[11] Yi Ouyang,et al. Learning-based Control of Unknown Linear Systems with Thompson Sampling , 2017, ArXiv.
[12] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[13] Ambuj Tewari,et al. Finite Time Identification in Unstable Linear Systems , 2017, Autom..
[14] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[15] Han-Fu Chen,et al. Convergence rate of least-squares identification and adaptive control for stochastic systems† , 1986 .
[16] B. Pasik-Duncan,et al. Adaptive Control , 1996, IEEE Control Systems.
[17] Frank L. Lewis,et al. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .
[18] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[19] Jan Sternby,et al. On consistency for the method of least squares using martingale theory , 1977 .
[20] P. Kumar,et al. Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .
[21] Benjamin Van Roy,et al. Posterior Sampling for Reinforcement Learning Without Episodes , 2016, ArXiv.
[22] Yi Ouyang,et al. Learning Unknown Markov Decision Processes: A Thompson Sampling Approach , 2017, NIPS.
[23] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[24] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[25] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[26] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[27] Graham C. Goodwin,et al. Adaptive filtering prediction and control , 1984 .
[28] Benjamin Recht,et al. Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator , 2017, ICML.
[29] Ambuj Tewari,et al. On Optimality of Adaptive Linear-Quadratic Regulators , 2018, ArXiv.
[30] Alessandro Lazaric,et al. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.
[31] Xi-Ren Cao,et al. Event-Based Optimization of Markov Systems , 2008, IEEE Transactions on Automatic Control.
[32] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[33] S. Sastry,et al. Adaptive Control: Stability, Convergence and Robustness , 1989 .
[34] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[35] Shie Mannor,et al. Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.
[36] Alessandro Lazaric,et al. Thompson Sampling for Linear-Quadratic Control Problems , 2017, AISTATS.