Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs

We present a visually grounded hierarchical planning algorithm for long-horizon manipulation tasks. Our algorithm offers a joint framework of neuro-symbolic task planning and low-level motion generation conditioned on the specified goal. At the core of our approach is a two-level scene graph representation, namely geometric scene graph and symbolic scene graph. This hierarchical representation serves as a structured, object-centric abstraction of manipulation scenes. Our model uses graph neural networks to process these scene graphs for predicting high-level task plans and low-level motions. We demonstrate that our method scales to long-horizon tasks and generalizes well to novel task goals. We validate our method in a kitchen storage task in both physical simulation and the real world. Experiments show that our method achieves over 70% success rate and nearly 90% of subgoal completion rate on the real robot while being four orders of magnitude faster in computation time compared to standard search-based task-and-motion planner. 1

[1]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[2]  Leslie Pack Kaelbling,et al.  Active Model Learning and Diverse Action Sampling for Task and Motion Planning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Silvio Savarese,et al.  Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation , 2019, CoRL.

[4]  Marc Toussaint,et al.  Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.

[5]  Rui Xu,et al.  Discovering Symbolic Models from Deep Learning with Inductive Biases , 2020, NeurIPS.

[6]  Oliver Kroemer,et al.  Graph-Structured Visual Imitation , 2019, CoRL.

[7]  Pieter Abbeel,et al.  Combined task and motion planning through an extensible planner-independent interface layer , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Swarat Chaudhuri,et al.  Incremental Task and Motion Planning: A Constraint-Based Approach , 2016, Robotics: Science and Systems.

[9]  Leslie Pack Kaelbling,et al.  Learning to guide task and motion planning using score-space representation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[11]  Danfei Xu,et al.  Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Luca Carlone,et al.  3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans , 2020, RSS 2020.

[13]  Dylan Hadfield-Menell,et al.  Guided search for task and motion plans using learned heuristics , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Matthew Botvinick,et al.  MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[15]  Leslie Pack Kaelbling,et al.  Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[17]  Chelsea Finn,et al.  Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation , 2019, ICLR.

[18]  Sergey Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[19]  Juan Carlos Niebles,et al.  Motion Reasoning for Goal-Based Imitation Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Silvio Savarese,et al.  3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Jeannette Bohg,et al.  Object-Centric Task and Motion Planning in Dynamic Environments , 2020, IEEE Robotics and Automation Letters.

[22]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Marc Toussaint,et al.  Logic-Geometric Programming: An Optimization-Based Approach to Combined Task and Motion Planning , 2015, IJCAI.

[24]  Leslie Pack Kaelbling,et al.  PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning , 2020, ICAPS.

[25]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[26]  Silvio Savarese,et al.  Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning , 2019 .

[27]  Jung-Su Ha,et al.  Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image , 2020, Robotics: Science and Systems.

[28]  Ingmar Posner,et al.  GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations , 2019, ICLR.

[29]  Silvio Savarese,et al.  Regression Planning Networks , 2019, NeurIPS.

[30]  Abhinav Gupta,et al.  Object-centric Forward Modeling for Model Predictive Control , 2019, CoRL.

[31]  Lydia E. Kavraki,et al.  Learning Feasibility for Task and Motion Planning in Tabletop Environments , 2019, IEEE Robotics and Automation Letters.

[32]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[33]  Sergey Levine,et al.  Planning with Goal-Conditioned Policies , 2019, NeurIPS.

[34]  Jessica B. Hamrick,et al.  Relational inductive bias for physical construction in humans and machines , 2018, CogSci.

[35]  Elise van der Pol,et al.  Contrastive Learning of Structured World Models , 2020, ICLR.

[36]  Chelsea Finn,et al.  Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors , 2020, NeurIPS.

[37]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[38]  Sungjin Ahn,et al.  SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition , 2020, ICLR.

[39]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.