AutoNeRF: Training Implicit Scene Representations with Autonomous Agents

Implicit representations such as Neural Radiance Fields (NeRF) have been shown to be very effective at novel view synthesis. However, these models typically require manual and careful human data collection for training. In this paper, we present AutoNeRF, a method to collect data required to train NeRFs using autonomous embodied agents. Our method allows an agent to explore an unseen environment efficiently and use the experience to build an implicit map representation autonomously. We compare the impact of different exploration strategies including handcrafted frontier-based exploration and modular approaches composed of trained high-level planners and classical low-level path followers. We train these models with different reward functions tailored to this problem and evaluate the quality of the learned representations on four different downstream tasks: classical viewpoint rendering, map reconstruction, planning, and pose refinement. Empirical results show that NeRFs can be trained on actively collected data using just a single episode of experience in an unseen environment, and can be used for several downstream robotic tasks, and that modular trained exploration models significantly outperform the classical baselines.

[1]  Angjoo Kanazawa,et al.  Nerfstudio: A Modular Framework for Neural Radiance Field Development , 2023, SIGGRAPH.

[2]  Martin R. Oswald,et al.  NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM , 2023, 2024 International Conference on 3D Vision (3DV).

[3]  Devendra Singh Chaplot,et al.  Navigating to objects in the real world , 2022, Science Robotics.

[4]  I. Reid,et al.  ActiveRMAP: Radiance Field for Active Mapping And Planning , 2022, ArXiv.

[5]  Olivier Simonin,et al.  Multi-Object Navigation with dynamically learned neural implicit representations , 2022, ArXiv.

[6]  Shiji Song,et al.  ActiveNeRF: Learning where to See with Uncertainty Estimation , 2022, ECCV.

[7]  Dhruv Batra,et al.  Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Pratul P. Srinivasan,et al.  Block-NeRF: Scalable Large Scene Neural View Synthesis , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Santhosh K. Ramakrishnan,et al.  PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  T. Müller,et al.  Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[11]  Martin R. Oswald,et al.  NICE-SLAM: Neural Implicit Scalable Encoding for SLAM , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ruslan Salakhutdinov,et al.  SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency , 2021, NeurIPS.

[13]  Tristan Laidlow,et al.  iLabel: Interactive Neural Scene Labelling , 2021, ArXiv.

[14]  Andrea Tagliasacchi,et al.  NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes , 2021, Trans. Mach. Learn. Res..

[15]  Federico Tombari,et al.  Neural Fields in Visual Computing and Beyond , 2021, Comput. Graph. Forum.

[16]  Shubham Tulsiani,et al.  No RL, No Simulation: Learning to Navigate without Navigating , 2021, NeurIPS.

[17]  Devendra Singh Chaplot,et al.  FILM: Following Instructions in Language with Modular Methods , 2021, ICLR.

[18]  Jeannette Bohg,et al.  Vision-Only Robot Navigation in a Neural Radiance World , 2021, IEEE Robotics and Automation Letters.

[19]  Olivier Simonin,et al.  Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation , 2021, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Angel X. Chang,et al.  Habitat 2.0: Training Home Assistants to Rearrange their Habitat , 2021, NeurIPS.

[21]  Stefan Leutenegger,et al.  In-Place Scene Labelling and Understanding with Implicit Scene Representation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Edgar Sucar,et al.  iMAP: Implicit Mapping and Positioning in Real-Time , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Stephan J. Garbin,et al.  FastNeRF: High-Fidelity Neural Rendering at 200FPS , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Sainbayar Sukhbaatar,et al.  Memory-Augmented Reinforcement Learning for Image-Goal Navigation , 2021, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Jonathan T. Barron,et al.  iNeRF: Inverting Neural Radiance Fields for Pose Estimation , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Angjoo Kanazawa,et al.  pixelNeRF: Neural Radiance Fields from One or Few Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ruslan Salakhutdinov,et al.  Object Goal Navigation using Goal-Oriented Semantic Exploration , 2020, NeurIPS.

[28]  Abhinav Gupta,et al.  Semantic Curiosity for Active Visual Learning , 2020, ECCV.

[29]  Ruslan Salakhutdinov,et al.  Neural Topological SLAM for Visual Navigation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ruslan Salakhutdinov,et al.  Learning to Explore using Active Neural SLAM , 2020, ICLR.

[31]  Jacob Krantz,et al.  Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments , 2020, ECCV.

[32]  Eric Marchand,et al.  Direct Visual Servoing in the Frequency Domain , 2020, IEEE Robotics and Automation Letters.

[33]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[34]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Tao Chen,et al.  Learning Exploration Policies for Navigation , 2019, ICLR.

[36]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[40]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[41]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Mathieu Aubry,et al.  A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Matthias Nießner,et al.  Autonomous reconstruction of unknown indoor scenes guided by time-varying tensor fields , 2017, ACM Trans. Graph..

[46]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[47]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[48]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[51]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[52]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[53]  Alexander Kleiner,et al.  A frontier-void-based approach for autonomous exploration in 3d , 2011, 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics.

[54]  Sven Behnke,et al.  Evaluating the Efficiency of Frontier-based Exploration Strategies , 2010, ISR/ROBOTIK.

[55]  Brian Yamauchi,et al.  A frontier-based approach for autonomous exploration , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[56]  J A Sethian,et al.  A fast marching level set method for monotonically advancing fronts. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[57]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.