Finite-Time Bounds for Sampling-Based Fitted Dynamic Programming
暂无分享,去创建一个
[1] A. Antos,et al. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[2] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[3] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[4] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[5] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[6] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[7] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[10] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[11] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[12] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[13] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[14] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.