Prioritized Sweeping Converges to the Optimal Value Function
暂无分享,去创建一个
[1] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] David Andre,et al. Generalized Prioritized Sweeping , 1997, NIPS.
[4] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[5] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.
[6] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[7] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[8] Kevin D. Seppi,et al. Prioritization Methods for Accelerating MDP Solvers , 2005, J. Mach. Learn. Res..
[9] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
[10] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.