论文信息 - Efficient Bounds in Heuristic Search Algorithms for Stochastic Shortest Path Problems

Efficient Bounds in Heuristic Search Algorithms for Stochastic Shortest Path Problems

Fully observable decision-theoretic planning problems are commonly modeled as stochastic shortest path (SSP) problems. For this class of planning problems, heuristic search algorithms (including LAO*, RTDP, and related algorithms), as well as the value iteration algorithm on which they are based, lack an efficient test for convergence to an e-optimal policy (except in the special case of discounting). We introduce a simple and efficient test for convergence that applies to SSP problems with positive action costs. The test can detect whether a policy is proper, that is, whether it achieves the goal state with probability 1. If proper, it gives error bounds that can be used to detect convergence to an e-optimal solution. The convergence test incurs no extra overhead besides computing the Bellman residual, and the performance guarantee it provides substantially improves the utility of this class of planning algorithms.

Eric A. Hansen | Ibrahim Abdoulahi | E. Hansen | Ibrahim Abdoulahi

[1] Reid G. Simmons,et al. Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic , 2006, AAAI.

[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3] Blai Bonet,et al. Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and Its Application to MDPs , 2006, ICAPS.

[4] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[5] Scott Sanner,et al. Bayesian Real-Time Dynamic Programming , 2009, IJCAI.

[6] Kevin D. Seppi,et al. Prioritization Methods for Accelerating MDP Solvers , 2005, J. Mach. Learn. Res..

[7] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[8] Geoffrey J. Gordon,et al. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.

[9] Blai Bonet,et al. Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback , 2003, IJCAI.

[10] Blai Bonet,et al. On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems , 2007, Math. Oper. Res..

[11] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[12] Patrick Doherty,et al. Iterative Bounding LAO , 2010, ECAI.

[13] Eric A. Hansen,et al. Suboptimality Bounds for Stochastic Shortest Path Problems , 2011, UAI.

[14] L. Kallenberg. Finite State and Action MDPS , 2003 .

[15] Shlomo Zilberstein,et al. LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..