Hindsight Optimization for Probabilistic Planning with Factored Actions

Inspired by the success of the satisfiability approach for deterministic planning, we propose a novel framework for on-line stochastic planning, by embedding the idea of hindsight optimization into a reduction to integer linear programming. In contrast to the previous work using reductions or hindsight optimization, our formulation is general purpose by working with domain specifications over factored state and action spaces, and by doing so is also scalable in principle to exponentially large action spaces. Our approach is competitive with state-of-the-art stochastic planners on challenging benchmark problems, and sometimes exceeds their performance especially in large action spaces.

[1]  Robert Givan,et al.  On-line Scheduling via Sampling , 2000, AIPS.

[2]  Thomas Keller,et al.  PROST: Probabilistic Planning Based on UCT , 2012, ICAPS.

[3]  Nathanael Hyafil,et al.  Utilizing Structured Representations and CSP's in Conformant Probabilistic Planning , 2004, ECAI.

[4]  Alexander Shapiro,et al.  The Sample Average Approximation Method for Stochastic Discrete Optimization , 2002, SIAM J. Optim..

[5]  Alan Fern,et al.  Symbolic Opportunistic Policy Iteration for Factored-Action MDPs , 2013, NIPS.

[6]  Bart Selman,et al.  Planning as Satisfiability , 1992, ECAI.

[7]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[8]  Alan Fern,et al.  Planning in Factored Action Spaces with Symbolic Dynamic Programming , 2012, AAAI.

[9]  Shlomo Zilberstein,et al.  Symbolic Generalization for On-line Planning , 2002, UAI.

[10]  Carmel Domshlak,et al.  Fast Probabilistic Planning through Weighted Model Counting , 2006, ICAPS.

[11]  Subbarao Kambhampati,et al.  Probabilistic Planning via Determinization in Hindsight , 2008, AAAI.

[12]  Shlomo Zilberstein,et al.  Lagrangian Relaxation Techniques for Scalable Spatial Conservation Planning , 2012, AAAI.

[13]  Robert Givan,et al.  A framework for simulation-based network control via hindsight optimization , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[14]  Gang Wu,et al.  Burst-level congestion control using hindsight optimization , 2002, IEEE Trans. Autom. Control..

[15]  Blai Bonet,et al.  Action Selection for MDPs: Anytime AO* Versus UCT , 2012, AAAI.

[16]  Wheeler Ruml,et al.  Anticipatory On-Line Planning , 2012, ICAPS.

[17]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[18]  Zhengzhu Feng,et al.  Symbolic heuristic search for factored Markov decision processes , 2002, AAAI/IAAI.

[19]  Michael L. Littman,et al.  Contingent planning under uncertainty via stochastic satisfiability , 1999, Artif. Intell..

[20]  Michael L. Littman,et al.  MAXPLAN: A New Approach to Probabilistic Planning , 1998, AIPS.

[21]  Wheeler Ruml,et al.  Improving Determinization in Hindsight for On-line Probabilistic Planning , 2010, ICAPS.

[22]  S. Yoon,et al.  On-line Anticipatory Planning , 2008 .

[23]  Malte Helmert,et al.  High-Quality Policies for the Canadian Traveler's Problem , 2010, SOCS.

[24]  Alan Fern,et al.  Dynamic Resource Allocation for Optimizing Population Diffusion , 2014, AISTATS.

[25]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[26]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[27]  Peng Dai,et al.  Reverse Iterative Deepening for Finite-Horizon MDPs with Large Branching Factors , 2012, ICAPS.