3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit the performance of current state-of-art methods, which are typically based on histograms over geometric properties. In this paper, we present 3DMatch, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data. To amass training data for our model, we propose a self-supervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions. Experiments show that our descriptor is not only able to match local geometry in new scenes for reconstruction, but also generalize to different tasks and spatial scales (e.g. instance-level object model alignment for the Amazon Picking Challenge, and mesh surface correspondence). Results show that 3DMatch consistently outperforms other state-of-the-art approaches by a significant margin. Code, data, benchmarks, and pre-trained models are available online at http://3dmatch.cs.princeton.edu.

[1]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Xiaowu Chen,et al.  3D Mesh Labeling via Deep Convolutional Neural Networks , 2015, ACM Trans. Graph..

[3]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[6]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[7]  Geoffrey E. Hinton,et al.  Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition-' Washington , D . C . , June , 1983 OPTIMAL PERCEPTUAL INFERENCE , 2011 .

[8]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Thomas A. Funkhouser,et al.  Structured Global Registration of RGB-D Scans in Indoor Environments , 2016, ArXiv.

[13]  Jonathan Tompson,et al.  Unsupervised Learning of Spatiotemporally Coherent Metrics , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Meng Wang,et al.  3D deep shape descriptor , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Leonidas J. Guibas,et al.  Shape2Pose , 2014, ACM Trans. Graph..

[16]  Fei-Fei Li,et al.  Learning Temporal Embeddings for Complex Video Analysis , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Cordelia Schmid,et al.  Semi-Local Affine Parts for Object Recognition , 2004, BMVC.

[18]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jitendra Malik,et al.  Recognizing Objects in Range Data Using Regional Point Descriptors , 2004, ECCV.

[20]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Jitendra Malik,et al.  Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[23]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[24]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[25]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Thomas A. Funkhouser,et al.  Fine-to-Coarse Global Registration of RGB-D Scans , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[30]  Niloy J. Mitra,et al.  Super4PCS: Fast Global Pointcloud Registration via Smart Indexing , 2019 .

[31]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Sebastian Scherer,et al.  3D Convolutional Neural Networks for landing zone detection from LiDAR , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Matthias Nießner,et al.  Learning to Navigate the Energy Landscape , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[35]  Dieter Fox,et al.  Unsupervised feature learning for 3D scene labeling , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Dieter Fox,et al.  Self-Supervised Visual Descriptor Learning for Dense Correspondence , 2017, IEEE Robotics and Automation Letters.

[37]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[40]  N. Mitra,et al.  4-points congruent sets for robust pairwise surface registration , 2008, SIGGRAPH 2008.

[41]  Vladlen Koltun,et al.  Fast Global Registration , 2016, ECCV.

[42]  Vladlen Koltun,et al.  Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[44]  Nico Blodow,et al.  Aligning point cloud views using persistent feature histograms , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[45]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).