Solving Large POMDPs using Real Time Dynamic Programming
暂无分享,去创建一个
[1] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[2] Mark S. Boddy,et al. An Analysis of Time-Dependent Planning , 1988, AAAI.
[3] C. Watkins. Learning from delayed rewards , 1989 .
[4] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[5] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[6] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[9] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[10] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[11] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[13] Marco Wiering,et al. HQ-Learning: Discovering Markovian Subgoals for Non-Markovian Reinforcement Learning , 1996 .
[14] Blai Bonet,et al. A Robust and Fast Action Selection Mechanism for Planning , 1997, AAAI/IAAI.
[15] Blai Bonet. High-Level Planning and Control with Incomplete Information Using POMDP's , 1998 .