Estimation Monte Carlo sans modèle de politiques de décision
暂无分享,去创建一个
Louis Wehenkel | Susan A. Murphy | Damien Ernst | Raphaël Fonteneau | Raphaël Fonteneau | Susan A. Murphy | L. Wehenkel | D. Ernst
[1] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[2] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[3] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[4] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[5] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[6] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[7] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[8] P. Dayan. The Convergence of TD(λ) for General λ , 2004, Machine Learning.
[9] Christos Dimitrakakis,et al. Rollout sampling approximate policy iteration , 2008, Machine Learning.
[10] Louis Wehenkel,et al. Model-Free Monte Carlo-like Policy Evaluation , 2010, AISTATS.
[11] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[12] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[13] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[14] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[15] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[16] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..