论文信息 - Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

MDPs are an attractive formalization for planning, but realistic problems often have intractably large state spaces. When we only need a partial policy to get from a fixed start state to a goal, restricting computation to states relevant to this task can make much larger problems tractable. We introduce a new algorithm, Bounded RTDP, which can produce partial policies with strong performance guarantees while only touching a fraction of the state space, even on problems where other algorithms would have to visit the full state space. To do so, Bounded RTDP maintains both upper and lower bounds on the optimal value function. The performance of Bounded RTDP is greatly aided by the introduction of a new technique to efficiently find suitable upper bounds; this technique can also be used to provide informed initialization to a wide range of other planning algorithms.

Geoffrey J. Gordon | Maxim Likhachev | H. Brendan McMahan | H. B. McMahan | M. Likhachev

[1] Leslie Pack Kaelbling,et al. Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[2] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4] Shlomo Zilberstein,et al. LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[5] Blai Bonet,et al. Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback , 2003, IJCAI.

[6] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[7] Anthony Stentz,et al. Focused Dynamic Programming: Extensive Comparative Results , 2004 .

[8] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[9] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[10] Geoffrey J. Gordon,et al. Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[11] Geoffrey J. Gordon,et al. Fast Exact Planning in Markov Decision Processes , 2005, ICAPS.