论文信息 - Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs

Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs

We present a visually grounded hierarchical planning algorithm for long-horizon manipulation tasks. Our algorithm offers a joint framework of neuro-symbolic task planning and low-level motion generation conditioned on the specified goal. At the core of our approach is a two-level scene graph representation, namely geometric scene graph and symbolic scene graph. This hierarchical representation serves as a structured, object-centric abstraction of manipulation scenes. Our model uses graph neural networks to process these scene graphs for predicting high-level task plans and low-level motions. We demonstrate that our method scales to long-horizon tasks and generalizes well to novel task goals. We validate our method in a kitchen storage task in both physical simulation and the real world. Experiments show that our method achieves over 70% success rate and nearly 90% of subgoal completion rate on the real robot while being four orders of magnitude faster in computation time compared to standard search-based task-and-motion planner. 1

[1] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[2] Leslie Pack Kaelbling,et al. Active Model Learning and Diverse Action Sampling for Task and Motion Planning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3] Silvio Savarese,et al. Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation , 2019, CoRL.

[4] Marc Toussaint,et al. Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.

[5] Rui Xu,et al. Discovering Symbolic Models from Deep Learning with Inductive Biases , 2020, NeurIPS.

[6] Oliver Kroemer,et al. Graph-Structured Visual Imitation , 2019, CoRL.

[7] Pieter Abbeel,et al. Combined task and motion planning through an extensible planner-independent interface layer , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[8] Swarat Chaudhuri,et al. Incremental Task and Motion Planning: A Constraint-Based Approach , 2016, Robotics: Science and Systems.

[9] Leslie Pack Kaelbling,et al. Learning to guide task and motion planning using score-space representation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[11] Danfei Xu,et al. Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Luca Carlone,et al. 3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans , 2020, RSS 2020.

[13] Dylan Hadfield-Menell,et al. Guided search for task and motion plans using learned heuristics , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[14] Matthew Botvinick,et al. MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[15] Leslie Pack Kaelbling,et al. Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16] Steven M. LaValle,et al. Planning algorithms , 2006 .

[17] Chelsea Finn,et al. Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation , 2019, ICLR.

[18] Sergey Levine,et al. Learning Latent Plans from Play , 2019, CoRL.

[19] Juan Carlos Niebles,et al. Motion Reasoning for Goal-Based Imitation Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[20] Silvio Savarese,et al. 3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Jeannette Bohg,et al. Object-Centric Task and Motion Planning in Dynamic Environments , 2020, IEEE Robotics and Automation Letters.

[22] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23] Marc Toussaint,et al. Logic-Geometric Programming: An Optimization-Based Approach to Combined Task and Motion Planning , 2015, IJCAI.

[24] Leslie Pack Kaelbling,et al. PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning , 2020, ICAPS.

[25] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[26] Silvio Savarese,et al. Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning , 2019 .

[27] Jung-Su Ha,et al. Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image , 2020, Robotics: Science and Systems.

[28] Ingmar Posner,et al. GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations , 2019, ICLR.

[29] Silvio Savarese,et al. Regression Planning Networks , 2019, NeurIPS.

[30] Abhinav Gupta,et al. Object-centric Forward Modeling for Model Predictive Control , 2019, CoRL.

[31] Lydia E. Kavraki,et al. Learning Feasibility for Task and Motion Planning in Tabletop Environments , 2019, IEEE Robotics and Automation Letters.

[32] Dieter Fox,et al. Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[33] Sergey Levine,et al. Planning with Goal-Conditioned Policies , 2019, NeurIPS.

[34] Jessica B. Hamrick,et al. Relational inductive bias for physical construction in humans and machines , 2018, CogSci.

[35] Elise van der Pol,et al. Contrastive Learning of Structured World Models , 2020, ICLR.

[36] Chelsea Finn,et al. Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors , 2020, NeurIPS.

[37] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[38] Sungjin Ahn,et al. SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition , 2020, ICLR.

[39] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.