P3VI: a partitioned, prioritized, parallel value iterator
暂无分享,去创建一个
[1] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
[2] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[3] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[5] Vipin Kumar,et al. Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[6] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[9] Kevin D. Seppi,et al. Efficient Value Iteration Using Partitioned Models , 2003, ICMLA.
[10] C. Alpert,et al. Multi-way graph and hypergraph partitioning , 1996 .
[11] Dimitri Bertsekas,et al. Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.
[12] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..