Finite-Sample Analysis for SARSA with Linear Function Approximation
暂无分享,去创建一个
[1] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.
[2] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[3] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.
[4] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[5] Benjamin Recht,et al. Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator , 2017, ICML.
[6] Csaba Szepesvári,et al. Statistical linear estimation with penalized estimators: an application to reinforcement learning , 2012, ICML.
[7] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[8] Benjamin Van Roy,et al. On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .
[9] Shie Mannor,et al. Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, COLT.
[10] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[11] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[12] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[13] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[14] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .
[15] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[16] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[17] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[18] Devavrat Shah,et al. Q-learning with Nearest Neighbors , 2018, NeurIPS.
[19] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[20] A. Y. Mitrophanov,et al. Sensitivity and convergence of uniformly ergodic Markov chains , 2005 .
[21] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[22] Yingbin Liang,et al. Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples , 2019, NeurIPS.
[23] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[24] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.
[25] Rémi Munos,et al. Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control , 2013, ECML/PKDD.
[26] R. Srikant,et al. Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning , 2019, NeurIPS.
[27] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[28] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[29] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[30] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[31] H. Kushner. Stochastic approximation: a survey , 2010 .
[32] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[33] Bruno Scherrer,et al. Rate of Convergence and Error Bounds for LSTD(λ) , 2014, ICML 2015.
[34] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[35] Theodore J. Perkins,et al. On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains , 2002, ICML.
[36] Alessandro Lazaric,et al. LSTD with Random Projections , 2010, NIPS.
[37] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[38] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[39] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.