BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects

We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object. Our method works for arbitrary rigid objects, even when visual texture is largely absent. The object is assumed to be segmented in the first frame only. No additional information is required, and no assumption is made about the interaction agent. Key to our method is a Neural Object Field that is learned concurrently with a pose graph optimization process in order to robustly accumulate information into a consistent 3D representation capturing both geometry and appearance. A dynamic pool of posed memory frames is automatically maintained to facilitate communication between these threads. Our approach handles challenging sequences with large pose changes, partial and full occlusion, untextured surfaces, and specular highlights. We show results on HO3D, YCBInEOAT, and BEHAVE datasets, demonstrating that our method significantly outperforms existing approaches. Project page: https://bundlesdf.github.io

[1]  D. Fox,et al.  MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare , 2022, CoRL.

[2]  Yashraj S. Narang,et al.  DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Michael J. Black,et al.  InterCap: Joint Markerless 3D Tracking of Humans and Objects in Interaction , 2022, GCPR.

[4]  Ho Kei Cheng,et al.  XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model , 2022, ECCV.

[5]  Xiaowei Zhou,et al.  OnePose: One-Shot Object Pose Estimation without CAD Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Stephen Tyree,et al.  Keypoint-Based Category-Level Object Pose Tracking from an RGB Sequence with Uncertainty Estimation , 2022, 2022 International Conference on Robotics and Automation (ICRA).

[7]  Qi Shan,et al.  FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Stefan Leutenegger,et al.  Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Chu-Hsing Lin,et al.  Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images , 2022, ECCV.

[10]  Bharat Lal Bhatnagar,et al.  BEHAVE: Dataset and Method for Tracking Human Object Interactions , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Bharat Lal Bhatnagar,et al.  CHORE: Contact, Human and Object REconstruction from a single RGB image , 2022, ECCV.

[12]  Jian Sun,et al.  FS6D: Few-Shot 6D Pose Estimation of Novel Objects , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  M. Sundermeyer,et al.  Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Lan Xu,et al.  NeuralHOFusion: Neural Volumetric Rendering under Human-object Interactions , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kostas E. Bekris,et al.  You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration , 2022, Robotics: Science and Systems.

[16]  T. Müller,et al.  Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[17]  Martin R. Oswald,et al.  NICE-SLAM: Neural Implicit Scalable Encoding for SLAM , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  S. Fidler,et al.  Extracting Triangular 3D Models, Materials, and Lighting From Images , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Kostas E. Bekris,et al.  CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[20]  Dan B. Goldman,et al.  Neural RGB-D Surface Reconstruction , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  He Wang,et al.  Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation , 2021, NeurIPS.

[22]  M. Vincze,et al.  Object Learning for 6D Pose Estimation and Grasping from RGB-D Videos of In-hand Manipulation , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Jia Deng,et al.  DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras , 2021, NeurIPS.

[24]  Kostas E. Bekris,et al.  BundleTrack: 6D Pose Tracking for Novel Objects without Instance or Category-Level 3D Models , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Yaron Lipman,et al.  Volume Rendering of Neural Implicit Surfaces , 2021, NeurIPS.

[26]  C. Theobalt,et al.  NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction , 2021, NeurIPS.

[27]  Xiaolong Wang,et al.  Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  M. Kaess,et al.  Compositional and Scalable Object SLAM , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Andreas Geiger,et al.  UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Leonidas J. Guibas,et al.  CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Hujun Bao,et al.  LoFTR: Detector-Free Local Feature Matching with Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Hujun Bao,et al.  NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Edgar Sucar,et al.  iMAP: Implicit Mapping and Positioning in Real-Time , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Federico Tombari,et al.  GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ilija Radosavovic,et al.  Reconstructing Hand-Object Interactions in the Wild , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Matthias Nießner,et al.  Seeing Behind Objects for 3D Multi-Object Tracking in RGB-D Sequences , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Cewu Lu,et al.  CPF: Learning a Contact Potential Field to Model the Hand-Object Interaction , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Heng Yang,et al.  TEASER: Fast and Certifiable Point Cloud Registration , 2020, IEEE Transactions on Robotics.

[39]  Mathieu Aubry,et al.  CosyPose: Consistent multi-view multi-object 6D pose estimation , 2020, ECCV.

[40]  Leonidas J. Guibas,et al.  Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images , 2020, ECCV.

[41]  Deva Ramanan,et al.  Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild , 2020, ECCV.

[42]  Kostas E. Bekris,et al.  se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]  Gim Hee Lee,et al.  Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation , 2020, ECCV.

[44]  Yana Hasson,et al.  Leveraging Photometric Consistency Over Time for Sparsely Supervised Hand-Object Reconstruction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Andrew J. Davison,et al.  MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ronen Basri,et al.  Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance , 2020, NeurIPS.

[47]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[48]  Kostas E. Bekris,et al.  Robust, Occlusion-aware Pose Estimation for Objects Grasped by Adaptive Hands , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Y. Lipman,et al.  Implicit Geometric Regularization for Learning Shapes , 2020, ICML.

[50]  Kai Xu,et al.  Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Dieter Fox,et al.  LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Silvio Savarese,et al.  6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[53]  V. Lepetit,et al.  HOnnotate: A Method for 3D Annotation of Hand and Object Poses , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[55]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[56]  Sanja Fidler,et al.  Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research , 2019, ArXiv.

[57]  Jörg Stückler,et al.  EM-Fusion: Dynamic Object-Level SLAM With Probabilistic Data Association , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Binbin Xu,et al.  MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[60]  Yi Li,et al.  DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[61]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[62]  Stefan Leutenegger,et al.  Fusion++: Volumetric Object-Level SLAM , 2018, 2018 International Conference on 3D Vision (3DV).

[63]  Lourdes Agapito,et al.  MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects , 2018, 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[64]  Zhengqi Li,et al.  MegaDepth: Learning Single-View Depth Prediction from Internet Photos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[66]  Stefan Schaal,et al.  Real-Time Perception Meets Reactive Motion Generation , 2017, IEEE Robotics and Automation Letters.

[67]  Slobodan Ilic,et al.  SDF-2-SDF Registration for Real-Time 3D Reconstruction from RGB-D Data , 2018, International Journal of Computer Vision.

[68]  Ulrich Schwanecke,et al.  Real-Time Monocular Pose Estimation of 3D Objects Using Temporally Consistent Local Color Histograms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[69]  Lourdes Agapito,et al.  Co-fusion: Real-time segmentation, tracking and fusion of multiple objects , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[70]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[71]  Matthias Nießner,et al.  3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Éric Marchand,et al.  Pose Estimation for Augmented Reality: A Hands-On Survey , 2016, IEEE Transactions on Visualization and Computer Graphics.

[73]  Matthias Nießner,et al.  VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[74]  Siddhartha S. Srinivasa,et al.  Benchmarking in Manipulation Research: Using the Yale-CMU-Berkeley Object and Model Set , 2015, IEEE Robotics & Automation Magazine.

[75]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[77]  Nikolaus Correll,et al.  Simultaneous localization, mapping, and manipulation for unsupervised object discovery , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[78]  Ian D. Reid,et al.  STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data , 2013, 2013 IEEE International Conference on Computer Vision.

[79]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[80]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[81]  Dieter Fox,et al.  Manipulator and object tracking for in-hand 3D object modeling , 2011, Int. J. Robotics Res..

[82]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[83]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[85]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .