Probabilistic Planning via Determinization in Hindsight

This paper investigates hindsight optimization as an approach for leveraging the significant advances in deterministic planning for action selection in probabilistic domains. Hindsight optimization is an online technique that evaluates the one-step-reachable states by sampling future outcomes to generate multiple non-stationary deterministic planning problems which can then be solved using search. Hindsight optimization has been successfully used in a number of online scheduling applications; however, it has not yet been considered in the substantially different context of goal-based probabilistic planning. We describe an implementation of hindsight optimization for probabilistic planning based on deterministic forward heuristic search and evaluate its performance on planning-competition benchmarks and other probabilistically interesting problems. The planner is able to outperform a number of probabilistic planners including FF-Replan on many problems. Finally, we investigate conditions under which hindsight optimization is guaranteed to be effective with respect to goal achievement, and also illustrate examples where the approach can go wrong.

[1]  Gang Wu,et al.  Burst-level congestion control using hindsight optimization , 2002, IEEE Trans. Autom. Control..

[2]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[3]  Robert Givan,et al.  A framework for simulation-based network control via hindsight optimization , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[4]  Olivier Buffet,et al.  FF + FPG: Guiding a Policy-Gradient Planner , 2007, ICAPS.

[5]  Håkan L. S. Younes Extending PDDL to Model Stochastic Decision Processes , 2003 .

[6]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[7]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[8]  Sylvie Thiébaux,et al.  Probabilistic planning vs replanning , 2007 .

[9]  Daniel Bryce,et al.  Sequential Monte Carlo in reachability heuristics for probabilistic planning , 2008, Artif. Intell..

[10]  Pascal Van Hentenryck,et al.  Performance Analysis of Online Anticipatory Algorithms for Large Multistage Stochastic Integer Programs , 2007, IJCAI.

[11]  Hector Geffner,et al.  From Conformant into Classical Planning: Efficient Translations that May Be Complete Too , 2007, ICAPS.

[12]  Håkan L. S. Younes,et al.  Policy Generation for Continuous-time Stochastic Domains with Concurrency , 2004, ICAPS.

[13]  Sylvie Thiébaux,et al.  Concurrent Probabilistic Planning in the Graphplan Framework , 2006, ICAPS.