Learning Neuro-Symbolic Relational Transition Models for Bilevel Planning

In robotic domains, learning and planning are complicated by continuous state spaces, continuous action spaces, and long task horizons. In this work, we address these challenges with Neuro-Symbolic Relational Transition Models (NSRTs), a novel class of models that are data-efficient to learn, compatible with powerful robotic planning methods, and generalizable over objects. NSRTs have both symbolic and neural components, enabling a bilevel planning scheme where symbolic AI planning in an outer loop guides continuous planning with neural models in an inner loop. Experiments in four robotic planning domains show that NSRTs can be learned after only tens or hundreds of training episodes, and then used for fast planning in new tasks that require up to 60 actions and involve many more objects than were seen during training.

[1]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[2]  George Konidaris,et al.  Learning Symbolic Representations for Planning with Parameterized Skills , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Leslie Pack Kaelbling,et al.  Learning Symbolic Operators for Task and Motion Planning , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[5]  Leslie Pack Kaelbling,et al.  GLIB: Efficient Exploration for Relational Model-Based Reinforcement Learning via Goal-Literal Babbling , 2021, AAAI.

[6]  Hector Muñoz-Avila,et al.  Learning HTN Method Preconditions and Action Models from Partial Observations , 2009, IJCAI.

[7]  Buser Say,et al.  A Unified Framework for Planning with Learned Neural Network Transition Models , 2021, AAAI.

[8]  Thomas Bolander,et al.  Planning From Pixels in Atari With Learned Symbolic Representations , 2020, ArXiv.

[9]  Mohammad Norouzi,et al.  Mastering Atari with Discrete World Models , 2020, ICLR.

[10]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[11]  Jung-Su Ha,et al.  Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image , 2020, Robotics: Science and Systems.

[12]  Jessica B. Hamrick,et al.  Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning , 2020, ArXiv.

[13]  Leslie Pack Kaelbling,et al.  STRIPStream: Integrating Symbolic Planners and Blackbox Samplers , 2018, ArXiv.

[14]  J. Piater,et al.  DeepSym: Deep Symbol Generation and Rule Learning from Unsupervised Continuous Robot Interaction for Planning , 2020, ArXiv.

[15]  Ron Alford,et al.  A Survey on Hierarchical Planning - One Abstract Idea, Many Concrete Realizations , 2019, IJCAI.

[16]  Alex S. Fukunaga,et al.  Classical Planning in Deep Latent Space: Bridging the Subsymbolic-Symbolic Boundary , 2017, AAAI.

[17]  De,et al.  Relational Reinforcement Learning , 2022 .

[18]  Sergey Levine,et al.  OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning , 2021, ICLR.

[19]  Leslie Pack Kaelbling,et al.  Online Replanning in Belief Space for Partially Observable Task and Motion Problems , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Masataro Asai,et al.  Unsupervised Grounding of Plannable First-Order Logic Representation from Images , 2019, ICAPS.

[21]  Rob Fergus,et al.  Composable Planning with Attributes , 2018, ICML.

[22]  Leslie Pack Kaelbling,et al.  Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[23]  Gaurav S. Sukhatme,et al.  Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[24]  Leslie Pack Kaelbling,et al.  Integrated task and motion planning in belief space , 2013, Int. J. Robotics Res..

[25]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[26]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Sheila A. McIlraith,et al.  Symbolic Plans as High-Level Instructions for Reinforcement Learning , 2020, ICAPS.

[29]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[30]  Pierre Sermanet,et al.  Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning , 2020, CoRL.

[31]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[32]  Marc Toussaint,et al.  Exploration in relational domains for model-based reinforcement learning , 2012, J. Mach. Learn. Res..

[33]  Sriraam Natarajan,et al.  RePReL: Integrating Relational Planning and Reinforcement Learning for Effective Abstraction , 2021, ICAPS.

[34]  Chelsea Finn,et al.  Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors , 2020, NeurIPS.

[35]  Jörg Hoffmann,et al.  FF: The Fast-Forward Planning System , 2001, AI Mag..

[36]  Blai Bonet,et al.  Planning as heuristic search , 2001, Artif. Intell..

[37]  Dylan Hadfield-Menell,et al.  Guided search for task and motion plans using learned heuristics , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Leslie Pack Kaelbling,et al.  Learning compositional models of robot skills for task and motion planning , 2021, Int. J. Robotics Res..

[39]  Qiang Yang,et al.  Downward Refinement and the Efficiency of Hierarchical Problem Solving , 1994, Artif. Intell..

[40]  Ankuj Arora,et al.  A review of learning planning action models , 2018, The Knowledge Engineering Review.

[41]  Blai Bonet,et al.  Learning Features and Abstract Actions for Computing Generalized Plans , 2018, AAAI.

[42]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[43]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[44]  Robert Givan,et al.  Relational Reinforcement Learning: An Overview , 2004, ICML 2004.

[45]  Victoria Xia,et al.  Learning sparse relational transition models , 2018, ICLR.

[46]  Sergey Levine,et al.  Parrot: Data-Driven Behavioral Priors for Reinforcement Learning , 2020, ICLR.

[47]  Wenlong Fu,et al.  Model-based reinforcement learning: A survey , 2018 .

[48]  Fangkai Yang,et al.  PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making , 2018, IJCAI.

[49]  Jivko Sinapov,et al.  SPOTTER: Extending Symbolic Planning Operators through Targeted Reinforcement Learning , 2020, AAMAS.

[50]  Dylan Hadfield-Menell,et al.  Modular task and motion planning in belief space , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[51]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[52]  Ali Farhadi,et al.  What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning , 2019, ArXiv.

[53]  Jessica B. Hamrick,et al.  On the role of planning in model-based deep reinforcement learning , 2020, ArXiv.

[54]  Pieter Abbeel,et al.  Combined task and motion planning through an extensible planner-independent interface layer , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[55]  Leslie Pack Kaelbling,et al.  Guiding Search in Continuous State-Action Spaces by Learning an Action Sampler From Off-Target Search Experience , 2018, AAAI.

[56]  Pat Langley,et al.  Learning hierarchical task networks by observation , 2006, ICML.

[57]  Beomjoon Kim,et al.  Learning value functions with relational state representations for guiding task-and-motion planning , 2019, CoRL.

[58]  Nils J. Nilsson,et al.  Shakey the Robot , 1984 .

[59]  Kelsey R. Allen,et al.  Learning constraint-based planning models from demonstrations , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[60]  Joseph J. Lim,et al.  Accelerating Reinforcement Learning with Learned Skill Priors , 2020, CoRL.

[61]  Leslie Pack Kaelbling,et al.  From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..

[62]  Erez Karpas,et al.  Generalized Planning With Deep Reinforcement Learning , 2020, ArXiv.

[63]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[64]  Stuart J. Russell,et al.  Angelic Semantics for High-Level Actions , 2007, ICAPS.

[65]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[66]  Leslie Pack Kaelbling,et al.  Integrated Task and Motion Planning , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[67]  Masataro Asai,et al.  Learning Neural-Symbolic Descriptive Planning Models via Cube-Space Priors: The Voyage Home (to STRIPS) , 2020, IJCAI.

[68]  Pedro A. Tsividis,et al.  Theory-based learning in humans and machines , 2019 .

[69]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[70]  Dileep George,et al.  Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[71]  Fangkai Yang,et al.  SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning , 2018, AAAI.