Depth-based short-sighted stochastic shortest path problems

Stochastic Shortest Path Problems (SSPs) are a common representation for probabilistic planning problems. Two approaches can be used to solve SSPs: (i) consider all probabilistically reachable states and (ii) plan only for a subset of these reachable states. Closed policies, the solutions obtained in the former approach, require significant computational effort, and they do not require replanning, i.e., the planner is never re-invoked. The second approach, employed by replanners, computes open policies, i.e., policies for a subset of the probabilistically reachable states. Therefore, when a state is reached in which the open policy is not defined, the replanner is reinvoked to compute a new open policy. In this article, we introduce a special case of SSPs, the depth-based short-sighted SSPs, in which every state has a nonzero probability of being reached using at most t actions. We also introduce the novel algorithm Short-Sighted Probabilistic Planner (SSiPP), which solves SSPs through depth-based short-sighted SSPs and guarantees that at least t actions can be executed without replanning. Therefore, SSiPP can compute both open and closed policies: as t increases, the returned policy approaches the behavior of a closed policy, and for t large enough, the returned policy is closed. Moreover, we present two extensions to SSiPP: Labeled-SSiPP and SSiPP-FF. The former extension incorporates a labeling mechanism to avoid revisiting states that have already converged. The latter extension combines SSiPP and determinizations to improve the performance of SSiPP in problems without dead ends. We also performed an extensive empirical evaluation of SSiPP and its extensions in several problems against state-of-the-art planners. The results show that (i) Labeled-SSiPP outperforms SSiPP and the considered planners in the task of finding the optimal solution when the problems have a low percentage of relevant states; and (ii) SSiPP-FF outperforms SSiPP in the task of quickly finding suboptimal solutions to problems without dead ends while performing similarly in problems with dead ends.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[3]  D. Bryce 6th International Planning Competition: Uncertainty Part , 2008 .

[4]  Manuela M. Veloso,et al.  Short-Sighted Stochastic Shortest Path Problems , 2012, ICAPS.

[5]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[6]  Håkan L. S. Younes,et al.  The First Probabilistic Track of the International Planning Competition , 2005, J. Artif. Intell. Res..

[7]  Peng Dai,et al.  Topological Value Iteration Algorithm for Markov Decision Processes , 2007, IJCAI.

[8]  Leslie Pack Kaelbling,et al.  Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[9]  Manuela M. Veloso,et al.  Variable Level-Of-Detail Motion Planning in Environments with Poorly Predictable Bodies , 2010, ECAI.

[10]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  F. Teichteil-Königsbuch,et al.  RFF : A Robust , FF-Based MDP Planning Algorithm for Generating Policies with Low Probability of Failure , 2008 .

[13]  Wheeler Ruml,et al.  Improving Determinization in Hindsight for On-line Probabilistic Planning , 2010, ICAPS.

[14]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[15]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[16]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[17]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[18]  Geoffrey J. Gordon,et al.  Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.

[19]  Sylvie Thiébaux,et al.  Probabilistic planning vs replanning , 2007 .

[20]  Scott Sanner,et al.  Bayesian Real-Time Dynamic Programming , 2009, IJCAI.

[21]  Manuela M. Veloso,et al.  Trajectory-Based Short-Sighted Probabilistic Planning , 2012, NIPS.

[22]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[23]  Subbarao Kambhampati,et al.  Probabilistic Planning via Determinization in Hindsight , 2008, AAAI.

[24]  Peng Dai,et al.  Focused Topological Value Iteration , 2009, ICAPS.

[25]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[26]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[27]  Reid G. Simmons,et al.  Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic , 2006, AAAI.

[28]  Blai Bonet,et al.  Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and Its Application to MDPs , 2006, ICAPS.

[29]  Guillaume Infantes,et al.  Extending Classical Planning Heuristics to Probabilistic Planning with Dead-Ends , 2011, AAAI.

[30]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .