Self-Taught Decision Theoretic Planning with First Order Decision Diagrams

We present a new paradigm for planning by learning, where the planner is given a model of the world and a small set of states of interest, but no indication of optimal actions in these states. The additional information can help focus the planner on regions of the state space that are of interest and lead to improved performance. We demonstrate this idea by introducing novel model-checking reduction operations for First Order Decision Diagrams (FODD), a representation that has been used to implement decision-theoretic planning with Relational Markov Decision Processes (RMDP). Intuitively, these reductions modify the construction of the value function by removing any complex specifications that are irrelevant to the set of training examples, thereby focusing on the region of interest. We show that such training examples can be constructed on the fly from a description of the planning problem thus we can bootstrap to get a self-taught planning system. Additionally, we provide a new heuristic to embed universal and conjunctive goals within the framework of RMDP planners, expanding the scope and applicability of such systems. We show that these ideas lead to significant improvements in performance in terms of both speed and coverage of the planner, yielding state of the art planning performance on problems from the International Planning Competition.

[1]  Michèle Sebag,et al.  Fast Theta-Subsumption with Constraint Satisfaction Algorithms , 2004, Machine Learning.

[2]  Jan Friso Groote,et al.  Binary decision diagrams for first-order predicate logic , 2003, J. Log. Algebraic Methods Program..

[3]  F. Teichteil-Königsbuch,et al.  RFF : A Robust , FF-Based MDP Planning Algorithm for Generating Policies with Low Probability of Failure , 2008 .

[4]  Steffen Hölldobler,et al.  FluCaP: A Heuristic Search Planner for First-Order MDPs , 2006, J. Artif. Intell. Res..

[5]  Luc De Raedt,et al.  Bellman goes relational , 2004, ICML.

[6]  Roni Khardon,et al.  First Order Decision Diagrams for Relational MDPs , 2007, IJCAI.

[7]  Marko Čepin Binary Decision Diagram , 2011 .

[8]  J. W. Lloyd,et al.  Foundations of logic programming; (2nd extended ed.) , 1987 .

[9]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[10]  Sylvie Thiébaux,et al.  Exploiting First-Order Regression in Inductive Policy Selection , 2004, UAI.

[11]  Eduardo F. Morales,et al.  Learning to fly by combining reinforcement learning with behavioural cloning , 2004, ICML.

[12]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[14]  J. Lloyd Foundations of Logic Programming , 1984, Symbolic Computation.

[15]  Scott Sanner,et al.  Practical solution techniques for first-order MDPs , 2009, Artif. Intell..

[16]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[17]  Kristian Kersting,et al.  Generalized First Order Decision Diagrams for First Order Markov Decision Processes , 2009, IJCAI.

[18]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[19]  Roni Khardon,et al.  Stochastic Planning with First Order Decision Diagrams , 2008, ICAPS.

[20]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.