Habitat 2.0: Training Home Assistants to Rearrange their Habitat

We introduce Habitat 2.0 (H2.0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios. We make comprehensive contributions to all levels of the embodied AI stack – data, simulation, and benchmark tasks. Specifically, we present: (i) ReplicaCAD: an artist-authored, annotated, reconfigurable 3D dataset of apartments (matching real spaces) with articulated objects (e.g. cabinets and drawers that can open/close); (ii) H2.0: a high-performance physics-enabled 3D simulator with speeds exceeding 25,000 simulation steps per second (850⇥ real-time) on an 8-GPU node, representing 100⇥ speed-ups over prior work; and, (iii) Home Assistant Benchmark (HAB): a suite of common tasks for assistive robots (tidy the house, stock groceries, set the table) that test a range of mobile manipulation capabilities. These large-scale engineering contributions allow us to systematically compare deep reinforcement learning (RL) at scale and classical sense-plan-act (SPA) pipelines in long-horizon structured tasks, with an emphasis on generalization to new objects, receptacles, and layouts. We find that (1) flat RL policies struggle on HAB compared to hierarchical ones; (2) a hierarchy with independent skills suffers from ‘hand-off problems’, and (3) SPA pipelines are more brittle than RL policies. Figure 1: A mobile manipulator (Fetch robot) simulated in Habitat 2.0 performing rearrangement tasks in a ReplicaCAD apartment – (left) opening a drawer before picking up an item from it, and (right) placing an object into the bowl after navigating to the table. Best viewed in motion at https://sites.google.com/view/habitat2.

[1]  Vladlen Koltun,et al.  Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning , 2020, ICML.

[2]  Michael Goesele,et al.  The Replica Dataset: A Digital Replica of Indoor Spaces , 2019, ArXiv.

[3]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[4]  Lydia E. Kavraki,et al.  Sampling-Based Methods for Motion Planning with Constraints , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[5]  Jonathan Tompson,et al.  Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks , 2020, ArXiv.

[6]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[7]  Roozbeh Mottaghi,et al.  Rearrangement: A Challenge for Embodied AI , 2020, ArXiv.

[8]  Dhruv Batra,et al.  Auxiliary Tasks Speed Up Learning PointGoal Navigation , 2020, CoRL.

[9]  Jonathan P. How,et al.  Motion planning for urban driving using RRT , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Silvio Savarese,et al.  Interactive Gibson Benchmark: A Benchmark for Interactive Navigation in Cluttered Environments , 2020, IEEE Robotics and Automation Letters.

[11]  Andrew J. Davison,et al.  RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.

[12]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Sonia Chernova,et al.  Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance? , 2019, IEEE Robotics and Automation Letters.

[14]  Advait Jain,et al.  Pulling open doors and drawers: Coordinating an omni-directional base and a compliant arm with Equilibrium Point control , 2010, 2010 IEEE International Conference on Robotics and Automation.

[15]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[16]  Andrew Bennett,et al.  Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction , 2018, EMNLP.

[17]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[18]  Nikolaus Correll,et al.  Reducing the Barrier to Entry of Complex Robotic Software: a MoveIt! Case Study , 2014, ArXiv.

[19]  Marco Pavone,et al.  Learning Sampling Distributions for Robot Motion Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Dieter Fox,et al.  Alternative Paths Planner (APP) for Provably Fixed-time Manipulation Planning in Semi-structured Environments , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Jason Baldridge,et al.  Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding , 2020, EMNLP.

[22]  Leonidas J. Guibas,et al.  SAPIEN: A SimulAted Part-Based Interactive ENvironment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Siddhartha S. Srinivasa,et al.  Posterior Sampling for Anytime Motion Planning on Graphs with Expensive-to-Evaluate Edges , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Kurt Konolige,et al.  Autonomous door opening and plugging in with a personal robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[25]  Siddhartha S. Srinivasa,et al.  Manipulation planning on constraint manifolds , 2009, 2009 IEEE International Conference on Robotics and Automation.

[26]  Stefan Schaal,et al.  Real-Time Perception Meets Reactive Motion Generation , 2017, IEEE Robotics and Automation Letters.

[27]  Silvio Savarese,et al.  Robot Navigation in Constrained Pedestrian Environments using Reinforcement Learning , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Dhruv Batra,et al.  How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget , 2020, AAMAS.

[29]  Dieter Fox,et al.  DART: Dense Articulated Real-Time Tracking , 2014, Robotics: Science and Systems.

[30]  Lydia Tapia,et al.  PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[32]  David Hsu,et al.  Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Dieter Fox,et al.  6-DOF GraspNet: Variational Grasp Generation for Object Manipulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Leslie Pack Kaelbling,et al.  Integrated Task and Motion Planning , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[35]  Nir Levine,et al.  An empirical investigation of the challenges of real-world reinforcement learning , 2020, ArXiv.

[36]  Siddhartha S. Srinivasa,et al.  Batch Informed Trees (BIT*): Sampling-based optimal planning via the heuristically guided search of implicit random geometric graphs , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Siddhartha S. Srinivasa,et al.  Informed RRT*: Optimal sampling-based path planning focused via direct sampling of an admissible ellipsoidal heuristic , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[38]  To all authors , 1995 .

[39]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[40]  R. Hetherington The Perception of the Visual World , 1952 .

[41]  Vladlen Koltun,et al.  Large Batch Simulation for Deep Reinforcement Learning , 2021, ICLR.

[42]  Thomas A. Funkhouser,et al.  MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments , 2017, ArXiv.

[43]  Piotr Stanczyk,et al.  SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference , 2020, ICLR.

[44]  Martijn Wisse,et al.  Team Delft's Robot Winner of the Amazon Picking Challenge 2016 , 2016, RoboCup.

[45]  M. Srivastava,et al.  Sim2Real Transfer for Deep Reinforcement Learning with Stochastic State Transition Delays , 2020, CoRL.

[46]  Kaleigh Clary,et al.  Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning , 2020, ICLR.

[47]  Ilya Kostrikov,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ArXiv.

[48]  Hammad Mazhar,et al.  CHRONO: a parallel multi-physics library for rigid-body, flexible-body, and fluid dynamics , 2013 .

[49]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[50]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[51]  Lydia E. Kavraki,et al.  The Open Motion Planning Library , 2012, IEEE Robotics & Automation Magazine.

[52]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[53]  Roozbeh Mottaghi,et al.  ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects , 2020, ArXiv.

[54]  Danica Kragic,et al.  Hierarchical Fingertip Space: A Unified Framework for Grasp Planning and In-Hand Grasp Adaptation , 2016, IEEE Transactions on Robotics.

[55]  Siddhartha S. Srinivasa,et al.  DART: Dynamic Animation and Robotics Toolkit , 2018, J. Open Source Softw..

[56]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[57]  Robin R. Murphy,et al.  Introduction to AI Robotics , 2000 .

[58]  Leslie Pack Kaelbling,et al.  Online Replanning in Belief Space for Partially Observable Task and Motion Problems , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[59]  Roozbeh Mottaghi,et al.  ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Jitendra Malik,et al.  Combining Optimal Control and Learning for Visual Navigation in Novel Environments , 2019, CoRL.

[61]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[62]  Christian Duriez,et al.  On the use of simulation in robotics: Opportunities, challenges, and suggestions for moving forward , 2020, Proceedings of the National Academy of Sciences.

[63]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[64]  Yuandong Tian,et al.  Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[65]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[66]  Pieter Abbeel,et al.  Motion planning with sequential convex optimization and convex collision checking , 2014, Int. J. Robotics Res..

[67]  Silvio Savarese,et al.  iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[68]  Roland Siegwart,et al.  Mesh Manifold Based Riemannian Motion Planning for Omnidirectional Micro Aerial Vehicles , 2021, IEEE Robotics and Automation Letters.

[69]  Santhosh K. Ramakrishnan,et al.  Occupancy Anticipation for Efficient Exploration and Navigation , 2020, ECCV.

[70]  Chuang Gan,et al.  Curious Representation Learning for Embodied Intelligence , 2021, ArXiv.

[71]  Maren Bennewitz,et al.  Whole-body motion planning for manipulation of articulated objects , 2013, 2013 IEEE International Conference on Robotics and Automation.

[72]  Peter Welinder,et al.  ORRB - OpenAI Remote Rendering Backend , 2019, ArXiv.

[73]  I. Frosio,et al.  Accelerating Reinforcement Learning through GPU Atari Emulation , 2019, NeurIPS.

[74]  Andrew Bennett,et al.  CHALET: Cornell House Agent Learning Environment , 2018, ArXiv.

[75]  Siddhartha S. Srinivasa,et al.  The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[76]  Roozbeh Mottaghi,et al.  ManipulaTHOR: A Framework for Visual Object Manipulation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Roozbeh Mottaghi,et al.  Visual Room Rearrangement , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[79]  Joshua B. Tenenbaum,et al.  The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark Towards Physically Realistic Embodied AI , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[80]  Dieter Fox,et al.  6-DOF Grasping for Target-driven Object Manipulation in Clutter , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[81]  Silvio Savarese,et al.  ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation , 2020, ArXiv.

[82]  Chuang Gan,et al.  ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation , 2020, ArXiv.

[83]  Kalyan Sunkavalli,et al.  OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Byron Boots,et al.  Continuous-time Gaussian process motion planning via probabilistic inference , 2017, Int. J. Robotics Res..

[85]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[86]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[87]  Matthias Nießner,et al.  Scan2CAD: Learning CAD Model Alignment in RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Byron Boots,et al.  Differentiable Gaussian Process Motion Planning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[89]  WILLIAM H. WARREN,et al.  The Perception-Action Coupling , 1990 .

[90]  Sehoon Ha,et al.  Success Weighted by Completion Time: A Dynamics-Aware Evaluation Criteria for Embodied Navigation , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[91]  Takeshi Oishi,et al.  Hand-Motion-guided Articulation and Segmentation Estimation , 2020, 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).

[92]  Sanja Fidler,et al.  VirtualHome: Simulating Household Activities Via Programs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[93]  Pieter Abbeel,et al.  rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch , 2019, ArXiv.

[94]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[95]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.