论文信息 - 3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit the performance of current state-of-art methods, which are typically based on histograms over geometric properties. In this paper, we present 3DMatch, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data. To amass training data for our model, we propose a self-supervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions. Experiments show that our descriptor is not only able to match local geometry in new scenes for reconstruction, but also generalize to different tasks and spatial scales (e.g. instance-level object model alignment for the Amazon Picking Challenge, and mesh surface correspondence). Results show that 3DMatch consistently outperforms other state-of-the-art approaches by a significant margin. Code, data, benchmarks, and pre-trained models are available online at http://3dmatch.cs.princeton.edu.

[1] Andrew E. Johnson,et al. Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Xiaowu Chen,et al. 3D Mesh Labeling via Deep Convolutional Neural Networks , 2015, ACM Trans. Graph..

[3] Jianxiong Xiao,et al. Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Gang Hua,et al. Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Matthias Nießner,et al. Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[6] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[7] Geoffrey E. Hinton,et al. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition-' Washington , D . C . , June , 1983 OPTIMAL PERCEPTUAL INFERENCE , 2011 .

[8] Richard Szeliski,et al. Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9] Rahul Sukthankar,et al. MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Matthias Nießner,et al. BundleFusion , 2016, TOGS.

[11] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12] Thomas A. Funkhouser,et al. Structured Global Registration of RGB-D Scans in Indoor Environments , 2016, ArXiv.

[13] Jonathan Tompson,et al. Unsupervised Learning of Spatiotemporally Coherent Metrics , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[14] Meng Wang,et al. 3D deep shape descriptor , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Leonidas J. Guibas,et al. Shape2Pose , 2014, ACM Trans. Graph..

[16] Fei-Fei Li,et al. Learning Temporal Embeddings for Complex Video Analysis , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17] Cordelia Schmid,et al. Semi-Local Affine Parts for Object Recognition , 2004, BMVC.

[18] Andrew Zisserman,et al. Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Jitendra Malik,et al. Recognizing Objects in Range Data Using Regional Point Descriptors , 2004, ECCV.

[20] Nassir Navab,et al. Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21] Jitendra Malik,et al. Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[23] Nico Blodow,et al. Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[24] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[25] Andrew J. Davison,et al. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[26] Yann LeCun,et al. Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Thomas A. Funkhouser,et al. Fine-to-Coarse Global Registration of RGB-D Scans , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Iasonas Kokkinos,et al. Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Federico Tombari,et al. Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[30] Niloy J. Mitra,et al. Super4PCS: Fast Global Pointcloud Registration via Smart Indexing , 2019 .

[31] Kuan-Ting Yu,et al. Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33] Sebastian Scherer,et al. 3D Convolutional Neural Networks for landing zone detection from LiDAR , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[34] Matthias Nießner,et al. Learning to Navigate the Energy Landscape , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[35] Dieter Fox,et al. Unsupervised feature learning for 3D scene labeling , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[36] Dieter Fox,et al. Self-Supervised Visual Descriptor Learning for Dense Correspondence , 2017, IEEE Robotics and Automation Letters.

[37] Andrew W. Fitzgibbon,et al. Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Andrew Owens,et al. SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[39] Marc Levoy,et al. A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[40] N. Mitra,et al. 4-points congruent sets for robust pairwise surface registration , 2008, SIGGRAPH 2008.

[41] Vladlen Koltun,et al. Fast Global Registration , 2016, ECCV.

[42] Vladlen Koltun,et al. Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Vincent Lepetit,et al. LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[44] Nico Blodow,et al. Aligning point cloud views using persistent feature histograms , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[45] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).