USA-Net: Unified Semantic and Affordance Representations for Robot Memory

In order for robots to follow open-ended instructions like"go open the brown cabinet over the sink", they require an understanding of both the scene geometry and the semantics of their environment. Robotic systems often handle these through separate pipelines, sometimes using very different representation spaces, which can be suboptimal when the two objectives conflict. In this work, we present USA-Net, a simple method for constructing a world representation that encodes both the semantics and spatial affordances of a scene in a differentiable map. This allows us to build a gradient-based planner which can navigate to locations in the scene specified using open-ended vocabulary. We use this planner to consistently generate trajectories which are both shorter 5-10% shorter and 10-30% closer to our goal query in CLIP embedding space than paths from comparable grid-based planners which don't leverage gradient information. To our knowledge, this is the first end-to-end differentiable planner optimizes for both semantics and affordance in a single implicit map. Code and visuals are available at our website: https://usa.bolte.cc/

[1]  Krishna Murthy Jatavallabhula,et al.  ConceptFusion: Open-set Multimodal 3D Mapping , 2023, Robotics: Science and Systems.

[2]  Devendra Singh Chaplot,et al.  Navigating to objects in the real world , 2022, Science Robotics.

[3]  T. Funkhouser,et al.  OpenScene: 3D Scene Understanding with Open Vocabularies , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Bryan N. Peele,et al.  Motion Policy Networks , 2022, CoRL.

[5]  Arthur D. Szlam,et al.  CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory , 2022, Robotics: Science and Systems.

[6]  Andy Zeng,et al.  Visual Language Maps for Robot Navigation , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[7]  M. Ryoo,et al.  Open-vocabulary Queryable Scene Representations for Real World Planning , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[8]  D. Fox,et al.  Correcting Robot Plans with Natural Language Feedback , 2022, Robotics: Science and Systems.

[9]  Mustafa Mukadam,et al.  iSDF: Real-Time Neural Signed Distance Fields for Robot Perception , 2022, Robotics: Science and Systems.

[10]  Marlin P. Strub,et al.  Adaptively Informed Trees (AIT*) and Effort Informed Trees (EIT*): Asymmetric bidirectional sampling-based path planning , 2022, Int. J. Robotics Res..

[11]  Armand Joulin,et al.  Detecting Twenty-thousand Classes using Image-level Supervision , 2022, ECCV.

[12]  Devendra Singh Chaplot,et al.  FILM: Following Instructions in Language with Modular Methods , 2021, ICLR.

[13]  Henry M. Clever,et al.  The Design of Stretch: A Compact, Lightweight Mobile Manipulator for Indoor Human Environments , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[14]  Ludwig Schmidt,et al.  CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration , 2022, ArXiv.

[15]  Dieter Fox,et al.  A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution , 2021, CoRL.

[16]  Angel X. Chang,et al.  Habitat 2.0: Training Home Assistants to Rearrange their Habitat , 2021, NeurIPS.

[17]  Byron Boots,et al.  STORM: An Integrated Framework for Fast Joint-Space Model-Predictive Control for Reactive Manipulation , 2021, CoRL.

[18]  Jennifer L. Palmer,et al.  ColMap: A memory-efficient occupancy grid mapping framework , 2021, Robotics Auton. Syst..

[19]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[20]  Michael C. Yip,et al.  Motion Planning Networks: Bridging the Gap Between Learning-Based and Classical Motion Planners , 2019, IEEE Transactions on Robotics.

[21]  Jonathan D. Gammell,et al.  Adaptively Informed Trees (AIT*): Fast Asymptotically Optimal Path Planning through Adaptive Heuristics , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Luke Zettlemoyer,et al.  ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Stefanie Tellex,et al.  Learning to Parse Natural Language to Grounded Reward Functions with Weak Supervision , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Yuandong Tian,et al.  Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[25]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Regina Barzilay,et al.  Representation Learning for Grounded Spatial Reasoning , 2017, TACL.

[27]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[28]  Anders Grunnet-Jepsen,et al.  Intel RealSense Stereoscopic Depth Cameras , 2017, CVPR 2017.

[29]  John Langford,et al.  Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.

[30]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[31]  Siddhartha S. Srinivasa,et al.  Batch Informed Trees (BIT*): Sampling-based optimal planning via the heuristically guided search of implicit random geometric graphs , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Marco Pavone,et al.  Fast marching tree: A fast marching sampling-based method for optimal motion planning in many dimensions , 2013, ISRR.

[33]  Stefan Kohlbrecher,et al.  A flexible and scalable SLAM system with full 3D motion estimation , 2011, 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics.

[34]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[35]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[36]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..