Discovering hidden structure in factored MDPs

Markov Decision Processes (MDPs) describe a wide variety of planning scenarios ranging from military operations planning to controlling a Mars rover. However, [email protected]?s solution techniques scale poorly, limiting [email protected]? practical applicability. In this work, we propose algorithms that automatically discover and exploit the hidden structure of factored MDPs. Doing so helps solve MDPs faster and with less memory than state-of-the-art techniques. Our algorithms discover two complementary state abstractions - basis functions and nogoods. A basis function is a conjunction of literals; if the conjunction holds true in a state, this guarantees the existence of at least one trajectory to the goal. Conversely, a nogood is a conjunction whose presence implies the non-existence of any such trajectory, meaning the state is a dead end. We compute basis functions by regressing goal descriptions through a determinized version of the MDP. Nogoods are constructed with a novel machine learning algorithm that uses basis functions as training data. Our state abstractions can be leveraged in several ways. We describe three diverse approaches - GOTH, a heuristic function for use in heuristic search algorithms such as RTDP; ReTrASE, an MDP solver that performs modified Bellman backups on basis functions instead of states; and SixthSense, a method to quickly detect dead-end states. In essence, our work integrates ideas from deterministic planning and basis function-based approximation, leading to methods that outperform existing approaches by a wide margin.

[1]  Jesse Hoey,et al.  APRICODD: Approximate Policy Construction Using Decision Diagrams , 2000, NIPS.

[2]  Olivier Buffet,et al.  The factored policy-gradient planner , 2009, Artif. Intell..

[3]  Sylvie Thiébaux,et al.  Exploiting First-Order Regression in Inductive Policy Selection , 2004, UAI.

[4]  Piergiorgio Bertoli,et al.  A Hybridized Planner for Stochastic Domains , 2007, IJCAI.

[5]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[6]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[7]  Ronen I. Brafman,et al.  Planning with Continuous Resources in Stochastic Domains , 2005, IJCAI.

[8]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[9]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[10]  Charles L. Forgy,et al.  Rete: a fast algorithm for the many pattern/many object pattern match problem , 1991 .

[11]  Sylvie Thiébaux,et al.  Probabilistic planning vs replanning , 2007 .

[12]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[13]  Scott Sanner,et al.  Practical Linear Value-approximation Techniques for First-order MDPs , 2006, UAI.

[14]  Michael L. Littman,et al.  The Complexity of Plan Existence and Evaluation in Probabilistic Domains , 1997, UAI.

[15]  Mausam,et al.  SixthSense: Fast and Reliable Recognition of Dead Ends in MDPs , 2010, AAAI.

[16]  David E. Smith,et al.  Preventing Unrecoverable Failures through Precautionary Planning , 2007 .

[17]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[18]  R. Bellman Dynamic programming. , 1957, Science.

[19]  Hector Geffner,et al.  Heuristic Search for Generalized Stochastic Shortest Path MDPs , 2011, ICAPS.

[20]  Subbarao Kambhampati,et al.  Probabilistic Planning via Determinization in Hindsight , 2008, AAAI.

[21]  Ugur Kuter,et al.  Incremental plan aggregation for generating policies in MDPs , 2010, AAMAS.

[22]  Mausam,et al.  Classical Planning in MDP Heuristics: with a Little Help from Generalization , 2010, ICAPS.

[23]  Subbarao Kambhampati,et al.  Failure Driven Dynamic Search Control for Partial Order Planners: An Explanation Based Approach , 1996, Artif. Intell..

[24]  John N. Tsitsiklis,et al.  Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[25]  Shie Mannor,et al.  Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[26]  Zhengzhu Feng,et al.  Symbolic heuristic search for factored Markov decision processes , 2002, AAAI/IAAI.

[27]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[28]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[29]  Håkan L. S. Younes,et al.  Policy Generation for Continuous-time Stochastic Domains with Concurrency , 2004, ICAPS.

[30]  Rina Dechter,et al.  Constraint Processing , 1995, Lecture Notes in Computer Science.

[31]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[32]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[33]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[34]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[35]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[36]  Lin Zhang,et al.  Decision-Theoretic Military Operations Planning , 2004, ICAPS.

[37]  Oren Etzioni,et al.  Integrating Abstraction and Explanation-Based Learning in PRODIGY , 1991, AAAI.

[38]  Johan de Kleer,et al.  An Assumption-Based TMS , 1987, Artif. Intell..

[39]  Ivan Serina,et al.  Planning Through Stochastic Local Search and Temporal Action Graphs in LPG , 2003, J. Artif. Intell. Res..

[40]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[41]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[42]  Tom Bylander,et al.  The Computational Complexity of Propositional STRIPS Planning , 1994, Artif. Intell..

[43]  Blai Bonet,et al.  mGPT: A Probabilistic Planner Based on Heuristic Search , 2005, J. Artif. Intell. Res..

[44]  D. Bryce,et al.  International Planning Competition Uncertainty Part: Benchmarks and Results , 2008 .

[45]  J. Dekleer An assumption-based TMS , 1986 .