暂无分享,去创建一个
Shie Mannor | Rémi Munos | Aviv Tamar | Assaf Hallak | R. Munos | Assaf Hallak | Aviv Tamar | Shie Mannor
[1] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[2] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[4] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[5] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[6] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[9] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[10] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[11] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[12] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[13] Martha White,et al. Emphatic Temporal-Difference Learning , 2015, ArXiv.
[14] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[15] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..