On Convergence of Gradient Expected Sarsa($\lambda$).
暂无分享,去创建一个
Gang Pan | Gang Zheng | Pengfei Li | Long Yang | Yu Zhang | Qian Zheng
[1] Dimitri P. Bertsekas,et al. Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.
[2] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[3] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[4] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[5] Tie-Yan Liu,et al. Finite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting , 2017, NIPS.
[6] Yingbin Liang,et al. Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples , 2019, NeurIPS.
[7] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[8] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.
[9] Yisong Yue,et al. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning , 2019, ArXiv.
[10] Shie Mannor,et al. Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, COLT.
[11] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[12] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[13] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[14] Nathaniel Korda,et al. On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence , 2014, ICML.
[15] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[16] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[17] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] Georgios B. Giannakis,et al. A Multistep Lyapunov Approach for Finite-Time Analysis of Biased Stochastic Approximation , 2019, ArXiv.
[20] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[21] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.
[22] Balázs Szörényi,et al. A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound , 2019, AAAI.
[23] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[24] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[25] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[26] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[27] Marc G. Bellemare,et al. Representations for Stable Off-Policy Reinforcement Learning , 2020, ICML.
[28] Wei Hu,et al. Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity , 2018, AISTATS.
[29] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[30] Lihong Li,et al. Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.
[31] Hoi-To Wai,et al. Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise , 2020, COLT.
[32] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[33] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[34] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .
[35] Pascal Vincent,et al. Convergent Tree-Backup and Retrace with Function Approximation , 2017, ICML.
[36] S. Kakade,et al. On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .
[37] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[38] R. Srikant,et al. Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning , 2019, NeurIPS.
[39] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[40] Yingbin Liang,et al. Finite-Sample Analysis for SARSA with Linear Function Approximation , 2019, NeurIPS.
[41] Richard S. Sutton,et al. Multi-step Off-policy Learning Without Importance Sampling Ratios , 2017, ArXiv.