A Theory of Goal-Oriented MDPs with Dead Ends

Stochastic Shortest Path (SSP) MDPs is a problem class widely studied in AI, especially in probabilistic planning. They describe a wide range of scenarios but make the restrictive assumption that the goal is reachable from any state, i.e., that dead-end states do not exist. Because of this, SSPs are unable to model various scenarios that may have catastrophic events (e.g., an airplane possibly crashing if it flies into a storm). Even though MDP algorithms have been used for solving problems with dead ends, a principled theory of SSP extensions that would allow dead ends, including theoretically sound algorithms for solving such MDPs, has been lacking. In this paper, we propose three new MDP classes that admit dead ends under increasingly weaker assumptions. We present Value Iteration-based as well as the more efficient heuristic search algorithms for optimally solving each class, and explore theoretical relationships between these classes. We also conduct a preliminary empirical study comparing the performance of our algorithms on different MDP classes, especially on scenarios with unavoidable dead ends.

[1]  Blai Bonet,et al.  Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback , 2003, IJCAI.

[2]  Sven Koenig,et al.  The interaction of representations and planning objectives for decision-theoretic planning tasks , 2002, J. Exp. Theor. Artif. Intell..

[3]  Sylvie Thiébaux,et al.  Probabilistic planning vs replanning , 2007 .

[4]  Hector Geffner,et al.  Heuristic Search for Generalized Stochastic Shortest Path MDPs , 2011, ICAPS.

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[6]  R. Bellman Dynamic programming. , 1957, Science.

[7]  Florent Teichteil-Königsbuch Stochastic Safest and Shortest Path Problems , 2012, AAAI.

[8]  K. Wakuta Vector-valued Markov decision processes and the systems of linear inequalities , 1995 .

[9]  Thomas A. Henzinger,et al.  Markov Decision Processes with Multiple Objectives , 2006, STACS.

[10]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[11]  O. Buffet The Factored Policy Gradient planner ( IPC-06 Version ) , 2006 .

[12]  Mausam,et al.  SixthSense: Fast and Reliable Recognition of Dead Ends in MDPs , 2010, AAAI.

[13]  Patrik Haslum,et al.  Admissible Heuristics for Optimal Planning , 2000, AIPS.

[14]  Blai Bonet,et al.  mGPT: A Probabilistic Planner Based on Heuristic Search , 2005, J. Artif. Intell. Res..

[15]  D. Bryce,et al.  International Planning Competition Uncertainty Part: Benchmarks and Results , 2008 .