Model-Free Monte Carlo-like Policy Evaluation
暂无分享,去创建一个
Louis Wehenkel | Susan A. Murphy | Damien Ernst | Raphaël Fonteneau | D. Ernst | L. Wehenkel | S. Murphy | R. Fonteneau
[1] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[2] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[3] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[4] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[5] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.
[6] Christos Dimitrakakis,et al. Rollout sampling approximate policy iteration , 2008, Machine Learning.
[7] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[10] S. Murphy,et al. Optimal dynamic treatment regimes , 2003 .
[11] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[12] J. Ingersoll. Theory of Financial Decision Making , 1987 .
[13] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[14] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[15] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[16] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[17] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[18] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .