One-Shot Neural Fields for 3D Object Understanding

We present a unified and compact scene representation for robotics, where each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction (e.g. recovering depth, point clouds, or voxel maps), collision checking, and stable grasp prediction. We build our representation from a single RGB input image at test time by leveraging recent advances in Neural Radiance Fields (NeRF) that learn category-level priors on large multiview datasets, then fine-tune on novel objects from one or few views. We expand the NeRF model for additional grasp outputs and explore ways to leverage this representation for robotics. At test-time, we build the representation from a single RGB input image observing the scene from only one viewpoint. We find that the recovered representation allows rendering from novel views, including of occluded object parts, and also for predicting successful stable grasps. Grasp poses can be directly decoded from our latent representation with an implicit grasp decoder. We experimented in both simulation and real world and demonstrated the capability for robust robotic grasping using such compact representation. Website: https://nerfgrasp.github.io

[1]  Peter R. Florence,et al.  NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  D. Fox,et al.  MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare , 2022, CoRL.

[3]  Jiazhao Zhang,et al.  GraspNeRF: Multiview-based 6-DoF Grasp Detection for Transparent and Specular Objects Using Generalizable NeRF , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[4]  D. Fox,et al.  Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation , 2022, CoRL.

[5]  Peter R. Florence,et al.  Reinforcement Learning with Neural Radiance Fields , 2022, NeurIPS.

[6]  R. Siegwart,et al.  Sampling-free obstacle gradients and reactive planning in Neural Radiance Fields (NeRF) , 2022, ArXiv.

[7]  Vincent Vanhoucke,et al.  Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items , 2022, 2022 International Conference on Robotics and Automation (ICRA).

[8]  M. Nießner,et al.  AutoRF: Learning 3D Object Radiance Fields from Single View Observations , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Peter R. Florence,et al.  NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields , 2022, 2022 International Conference on Robotics and Automation (ICRA).

[10]  Russ Tedrake,et al.  Learning Multi-Object Dynamics with Compositional Neural Radiance Fields , 2022, CoRL.

[11]  Kostas E. Bekris,et al.  You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration , 2022, Robotics: Science and Systems.

[12]  T. Müller,et al.  Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[13]  Orazio Gallo,et al.  Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  D. Ramanan,et al.  Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly- Throughs , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shalini De Mello,et al.  Efficient Geometry-aware 3D Generative Adversarial Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Vincent Sitzmann,et al.  Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[17]  Benjamin Recht,et al.  Plenoxels: Radiance Fields without Neural Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ken Goldberg,et al.  Dex-NeRF: Using a Neural Radiance Field to Grasp Transparent Objects , 2021, CoRL.

[19]  Jeannette Bohg,et al.  Vision-Only Robot Navigation in a Neural Radiance World , 2021, IEEE Robotics and Automation Letters.

[20]  Kostas E. Bekris,et al.  CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[21]  Stephen Tyree,et al.  Single-Stage Keypoint- Based Category-Level Object Pose Estimation from an RGB Image , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[22]  Hujun Bao,et al.  Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Lourdes Agapito,et al.  CodeNeRF: Disentangled Neural Radiance Fields for Object Categories , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Byeong-Uk Lee,et al.  Category-Level Metric Scale Object Shape and Pose Estimation , 2021, IEEE Robotics and Automation Letters.

[25]  Kostas E. Bekris,et al.  BundleTrack: 6D Pose Tracking for Novel Objects without Instance or Category-Level 3D Models , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Vincent Sitzmann,et al.  3D Neural Scene Representations for Visuomotor Control , 2021, CoRL.

[27]  Stephen Tyree,et al.  NViSII: A Scriptable Tool for Photorealistic Image Generation , 2021, ArXiv.

[28]  Yin Cui,et al.  Open-vocabulary Object Detection via Vision and Language Knowledge Distillation , 2021, ICLR.

[29]  Yuke Zhu,et al.  Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations , 2021, Robotics: Science and Systems.

[30]  Dieter Fox,et al.  Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Edgar Sucar,et al.  iMAP: Implicit Mapping and Positioning in Real-Time , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Jonathan T. Barron,et al.  iNeRF: Inverting Neural Radiance Fields for Pose Estimation , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33]  Angjoo Kanazawa,et al.  pixelNeRF: Neural Radiance Fields from One or Few Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[35]  Dieter Fox,et al.  6-DOF GraspNet: Variational Grasp Generation for Object Manipulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[37]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[39]  Ken Goldberg,et al.  Evo-NeRF: Evolving NeRF for Sequential Robot Grasping of Transparent Objects , 2022, CoRL.

[40]  Feras Dayoub,et al.  Implicit Object Mapping With Noisy Data , 2022, ArXiv.

[41]  Yilun Du,et al.  MIRA: Mental Imagery for Robotic Affordances , 2022, CoRL.

[42]  Oiwi Parker Jones,et al.  Touching a NeRF: Leveraging Neural Radiance Fields for Tactile Sensory Data Generation , 2022, CoRL.

[43]  Jan Kautz,et al.  Self-supervised Single-view 3D Reconstruction via Semantic Consistency , 2020, ECCV.