Efficient Bounds in Heuristic Search Algorithms for Stochastic Shortest Path Problems

Fully observable decision-theoretic planning problems are commonly modeled as stochastic shortest path (SSP) problems. For this class of planning problems, heuristic search algorithms (including LAO*, RTDP, and related algorithms), as well as the value iteration algorithm on which they are based, lack an efficient test for convergence to an e-optimal policy (except in the special case of discounting). We introduce a simple and efficient test for convergence that applies to SSP problems with positive action costs. The test can detect whether a policy is proper, that is, whether it achieves the goal state with probability 1. If proper, it gives error bounds that can be used to detect convergence to an e-optimal solution. The convergence test incurs no extra overhead besides computing the Bellman residual, and the performance guarantee it provides substantially improves the utility of this class of planning algorithms.

[1]  Reid G. Simmons,et al.  Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic , 2006, AAAI.

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  Blai Bonet,et al.  Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and Its Application to MDPs , 2006, ICAPS.

[4]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[5]  Scott Sanner,et al.  Bayesian Real-Time Dynamic Programming , 2009, IJCAI.

[6]  Kevin D. Seppi,et al.  Prioritization Methods for Accelerating MDP Solvers , 2005, J. Mach. Learn. Res..

[7]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[8]  Geoffrey J. Gordon,et al.  Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.

[9]  Blai Bonet,et al.  Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback , 2003, IJCAI.

[10]  Blai Bonet,et al.  On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems , 2007, Math. Oper. Res..

[11]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[12]  Patrick Doherty,et al.  Iterative Bounding LAO , 2010, ECAI.

[13]  Eric A. Hansen,et al.  Suboptimality Bounds for Stochastic Shortest Path Problems , 2011, UAI.

[14]  L. Kallenberg Finite State and Action MDPS , 2003 .

[15]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..