Planning Under Uncertainty Using Reduced Models: Revisiting Determinization

We introduce a family of MDP reduced models characterized by two parameters: the maximum number of primary outcomes per action that are fully accounted for and the maximum number of occurrences of the remaining exceptional outcomes that are planned for in advance. Reduced models can be solved much faster using heuristic search algorithms such as LAO*, benefiting from the dramatic reduction in the number of reachable states. A commonly used determinization approach is a special case of this family of reductions, with one primary outcome per action and zero exceptional outcomes per plan. We present a framework to compute the benefits of planning with reduced models, relying on online planning when the number of exceptional outcomes exceeds the bound. Using this framework, we compare the performance of various reduced models and consider the challenge of generating good ones automatically. We show that each one of the dimensions—allowing more than one primary outcome or planning for some limited number of exceptions— could improve performance relative to standard determinization. The results place recent work on determinization in a broader context and lay the foundation for efficient and systematic exploration of the space of MDP model reductions.

[1]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[2]  Shlomo Zilberstein,et al.  Using Anytime Algorithms in Intelligent Systems , 1996, AI Mag..

[3]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[4]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[5]  Michael L. Littman,et al.  Probabilistic Propositional Planning: Representations and Complexity , 1997, AAAI/IAAI.

[6]  Shlomo Zilberstein,et al.  Heuristic Search in Cyclic AND/OR Graphs , 1998, AAAI/IAAI.

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[9]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[10]  Eric Horvitz,et al.  Principles and applications of continual computation , 2001, Artif. Intell..

[11]  Balaraman Ravindran,et al.  Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.

[12]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[13]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[14]  Manuela M. Veloso,et al.  Fault Tolerant Planning: Toward Probabilistic Uncertainty Models in Symbolic Non-Deterministic Planning , 2004, ICAPS.

[15]  Håkan L. S. Younes,et al.  The First Probabilistic Track of the International Planning Competition , 2005, J. Artif. Intell. Res..

[16]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[17]  Sylvie Thiébaux,et al.  Probabilistic planning vs replanning , 2007 .

[18]  Subbarao Kambhampati,et al.  Probabilistic Planning via Determinization in Hindsight , 2008, AAAI.

[19]  D. Bryce 6th International Planning Competition: Uncertainty Part , 2008 .

[20]  Wheeler Ruml,et al.  Improving Determinization in Hindsight for On-line Probabilistic Planning , 2010, ICAPS.

[21]  Ugur Kuter,et al.  Incremental plan aggregation for generating policies in MDPs , 2010, AAMAS.

[22]  Mausam,et al.  A Theory of Goal-Oriented MDPs with Dead Ends , 2012, UAI.

[23]  Manuela M. Veloso,et al.  Short-Sighted Stochastic Shortest Path Problems , 2012, ICAPS.

[24]  Subbarao Kambhampati,et al.  Synthesizing Robust Plans under Incomplete Domain Models , 2011, NIPS.

[25]  Claudia V. Goldman,et al.  Fault-Tolerant Planning under Uncertainty , 2013, IJCAI.

[26]  Carmel Domshlak,et al.  Fault Tolerant Planning: Complexity and Compilation , 2013, ICAPS.