论文信息 - Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic

Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic

Real-time dynamic programming (RTDP) is a heuristic search algorithm for solving MDPs. We present a modified algorithm called Focused RTDP with several improvements. While RTDP maintains only an upper bound on the long-term reward function, FRTDP maintains two-sided bounds and bases the output policy on the lower bound. FRTDP guides search with a new rule for outcome selection, focusing on parts of the search graph that contribute most to uncertainty about the values of good policies. FRTDP has modified trial termination criteria that should allow it to solve some problems (within E) that RTDP cannot. Experiments show that for all the problems we studied, FRTDP significantly outperforms RTDP and LRTDP, and converges with up to six times fewer backups than the state-of-the-art HDP algorithm.

Reid G. Simmons | Trey Smith | R. Simmons | Trey Smith

[1] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[2] Richard Goodwin. Meta-Level Control for Decision-Theoretic Planners , 1996 .

[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4] Weihong Zhang,et al. Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[5] Shlomo Zilberstein,et al. LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[6] Blai Bonet,et al. Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback , 2003, IJCAI.

[7] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[8] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[9] Geoffrey J. Gordon,et al. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.

[10] Geoffrey J. Gordon,et al. Fast Exact Planning in Markov Decision Processes , 2005, ICAPS.