Planning with Noisy Probabilistic Relational Rules

Noisy probabilistic relational rules are a promising world model representation for several reasons. They are compact and generalize over world instantiations. They are usually interpretable and they can be learned effectively from the action experiences in complex worlds. We investigate reasoning with such rules in grounded relational domains. Our algorithms exploit the compactness of rules for efficient and fexible decision-theoretic planning. As a first approach, we combine these rules with the Upper Confidence Bounds applied to Trees (UCT) algorithm based on look-ahead trees. Our second approach converts these rules into a structured dynamic Bayesian network representation and predicts the effects of action sequences using approximate inference and beliefs over world states. We evaluate the effectiveness of our approaches for planning in a simulated complex 3D robot manipulation scenario with an articulated manipulator and realistic physics and in domains of the probabilistic planning competition. Empirical results show that our methods can solve problems where existing methods fail.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Thomas Gärtner,et al.  Graph kernels and Gaussian processes for relational reinforcement learning , 2006, Machine Learning.

[3]  Maurice Bruynooghe,et al.  Online Learning and Exploiting Relational Models in Reinforcement Learning , 2007, IJCAI.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Robert P. Goldman,et al.  Using Classical Planners to Solve Nondeterministic Planning Problems , 2008, ICAPS.

[6]  Sylvie Thiébaux,et al.  Exploiting First-Order Regression in Inductive Policy Selection , 2004, UAI.

[7]  Rajesh Kalyanam,et al.  Stochastic Enforced Hill-Climbing , 2008, J. Artif. Intell. Res..

[8]  Amedeo Cesta,et al.  Recent Advances in AI Planning , 1997, Lecture Notes in Computer Science.

[9]  Marc Toussaint,et al.  Approximate inference for planning in stochastic relational worlds , 2009, ICML '09.

[10]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[11]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[12]  Kristian Kersting,et al.  Non-parametric policy gradients: a unified treatment of propositional and relational domains , 2008, ICML '08.

[13]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[14]  Matthew Botvinick,et al.  Goal-directed decision making in prefrontal cortex: a computational framework , 2008, NIPS.

[15]  Scott Sanner,et al.  Approximate Solution Techniques for Factored First-Order MDPs , 2007, ICAPS.

[16]  Leslie Pack Kaelbling,et al.  Adaptive Envelope MDPs for Relational Equivalence-based Planning , 2008 .

[17]  Michael L. Littman,et al.  The Computational Complexity of Probabilistic Planning , 1998, J. Artif. Intell. Res..

[18]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[19]  Kevin P. Murphy,et al.  The Factored Frontier Algorithm for Approximate Inference in DBNs , 2001, UAI.

[20]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[21]  Subbarao Kambhampati,et al.  Probabilistic Planning via Determinization in Hindsight , 2008, AAAI.

[22]  Håkan L. S. Younes,et al.  PPDDL 1 . 0 : An Extension to PDDL for Expressing Planning Domains with Probabilistic Effects , 2004 .

[23]  Leslie Pack Kaelbling,et al.  Envelope-based Planning in Relational MDPs , 2003, NIPS.

[24]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[25]  Marc Toussaint,et al.  Relevance Grounding for Planning in Relational Domains , 2009, ECML/PKDD.

[26]  Ugur Kuter,et al.  Incremental plan aggregation for generating policies in MDPs , 2010, AAMAS.

[27]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[28]  G. Hesslow Conscious thought as simulation of behaviour and perception , 2002, Trends in Cognitive Sciences.

[29]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[30]  Scott Sanner,et al.  Practical solution techniques for first-order MDPs , 2009, Artif. Intell..

[31]  Roni Khardon,et al.  First Order Decision Diagrams for Relational MDPs , 2007, IJCAI.

[32]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[33]  Kristian Kersting,et al.  Generalized First Order Decision Diagrams for First Order Markov Decision Processes , 2009, IJCAI.

[34]  John Langford,et al.  Probabilistic Planning in the Graphplan Framework , 1999, ECP.

[35]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[36]  Luc De Raedt,et al.  Bellman goes relational , 2004, ICML.

[37]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .

[38]  Eldar Karabaev,et al.  A Heuristic Search Algorithm for Solving First-Order MDPs , 2005, UAI.

[39]  Ross D. Shachter Probabilistic Inference and Influence Diagrams , 1988, Oper. Res..

[40]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[41]  Gregory F. Cooper,et al.  A Method for Using Belief Networks as Influence Diagrams , 2013, UAI 1988.

[42]  Peter Geibel,et al.  Learning Models of Relational MDPs Using Graph Kernels , 2007, MICAI.

[43]  Thomas J. Walsh,et al.  Efficient learning of relational models for sequential decision making , 2010 .

[44]  Olivier Buffet,et al.  The factored policy-gradient planner , 2009, Artif. Intell..

[45]  Leslie Pack Kaelbling,et al.  Action-Space Partitioning for Planning , 2007, AAAI.

[46]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[47]  Carmel Domshlak,et al.  Probabilistic Planning via Heuristic Forward Search and Weighted Model Counting , 2007, J. Artif. Intell. Res..

[48]  B. Nebel The FF Planning System : Fast Plan Generation , 2011 .

[49]  Sylvie Thiébaux,et al.  Probabilistic planning vs replanning , 2007 .

[50]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[51]  De,et al.  Relational Reinforcement Learning , 2022 .

[52]  Steffen Hölldobler,et al.  A Logic-based Approach to Dynamic Programming , 2004 .