Occupation Measure Heuristics for Probabilistic Planning

For the past 25 years, heuristic search has been used to solve domain-independent probabilistic planning problems, but with heuristics that determinise the problem and ignore precious probabilistic information. To remedy this situation, we explore the use of occupation measures, which represent the expected number of times a given action will be executed in a given state of a policy. By relaxing the well-known linear program that computes them, we derive occupation measure heuristics – the first admissible heuristics for stochastic shortest path problems (SSPs) taking probabilities into account. We show that these heuristics can also be obtained by extending recent operator-counting heuristic formulations used in deterministic planning. Since the heuristics are formulated as linear programs over occupation measures, they can easily be extended to more complex probabilistic planning models, such as constrained SSPs (C-SSPs). Moreover, their formulation can be tightly integrated into i-dual, a recent LP-based heuristic search algorithm for (constrained) SSPs, resulting in a novel probabilistic planning approach in which policy update and heuristic computation work in unison. Our experiments in several domains demonstrate the benefits of these new heuristics and approach.

[1]  Sylvie Thiébaux,et al.  Prottle: A Probabilistic Temporal Planner , 2005, AAAI.

[2]  Olivier Buffet,et al.  Revisiting Goal Probability Analysis in Probabilistic Planning , 2016, ICAPS.

[3]  Carmel Domshlak,et al.  Probabilistic Planning via Heuristic Forward Search and Weighted Model Counting , 2007, J. Artif. Intell. Res..

[4]  Patrik Haslum,et al.  Admissible Heuristics for Optimal Planning , 2000, AIPS.

[5]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[6]  Sylvie Thiébaux,et al.  Heuristic Search in Dual Space for Constrained Stochastic Shortest Path Problems , 2016, ICAPS.

[7]  Florent Teichteil-Königsbuch Stochastic Safest and Shortest Path Problems , 2012, AAAI.

[8]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[9]  David E. Smith,et al.  Progressive heuristic search for probabilistic planning based on interaction estimates , 2014, Expert Syst. J. Knowl. Eng..

[10]  Blai Bonet,et al.  Flow-Based Heuristics for Optimal Planning: Landmarks and Merges , 2014, ICAPS.

[11]  Blai Bonet,et al.  LP-Based Heuristics for Cost-Optimal Planning , 2014, ICAPS.

[12]  Carmel Domshlak,et al.  Landmarks, Critical Paths and Abstractions: What's the Difference Anyway? , 2009, ICAPS.

[13]  Edmund H. Durfee,et al.  Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors , 2005, IJCAI.

[14]  Jendrik Seipp,et al.  From Non-Negative to General Operator Cost Partitioning , 2015, AAAI.

[15]  Subbarao Kambhampati,et al.  An LP-Based Heuristic for Optimal Planning , 2007, CP.

[16]  E. Altman Constrained Markov Decision Processes , 1999 .

[17]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[18]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[19]  Sylvie Thiébaux,et al.  Concurrent Probabilistic Planning in the Graphplan Framework , 2006, ICAPS.

[20]  F. d'Epenoux,et al.  A Probabilistic Production and Inventory Problem , 1963 .

[21]  Hector Geffner,et al.  Heuristic Search for Generalized Stochastic Shortest Path MDPs , 2011, ICAPS.

[22]  Patrik Haslum,et al.  Flexible Abstraction Heuristics for Optimal Sequential Planning , 2007, ICAPS.

[23]  Mausam,et al.  A Theory of Goal-Oriented MDPs with Dead Ends , 2012, UAI.

[24]  Andrew Coles,et al.  Planning in probabilistic domains using a deterministic numeric planner , 2006 .

[25]  Daniel Bryce,et al.  Sequential Monte Carlo in Probabilistic Planning Reachability Heuristics , 2006, ICAPS.

[26]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[27]  Blai Bonet,et al.  Planning as heuristic search , 2001, Artif. Intell..