Hybrid Least-Squares Algorithms for Approximate Policy Evaluation
暂无分享,去创建一个
[1] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[3] Sridhar Mahadevan,et al. Representation Policy Iteration , 2005, UAI.
[4] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[5] Lihong Li,et al. A worst-case comparison between temporal difference and residual gradient with linear function approximation , 2008, ICML '08.
[6] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[7] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[8] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.
[9] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[10] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[11] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[12] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[13] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[14] Michail G. Lagoudakis,et al. Least-Squares Methods in Reinforcement Learning for Control , 2002, SETN.
[15] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[16] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[17] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .