Regularized Policy Iteration
暂无分享,去创建一个
Shie Mannor | Csaba Szepesvári | Mohammad Ghavamzadeh | Amir Massoud Farahmand | Csaba Szepesvari | Shie Mannor | A. Farahmand | M. Ghavamzadeh
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[3] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[4] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[6] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[7] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[8] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[9] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[10] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[13] Daniel Polani,et al. Least Squares SVM for Least Squares TD Learning , 2006, ECAI.
[14] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[15] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[16] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[17] M. Loth,et al. Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.