论文信息 - PlaneMatch: Patch Coplanarity Prediction for Robust RGB-D Reconstruction

PlaneMatch: Patch Coplanarity Prediction for Robust RGB-D Reconstruction

We introduce a novel RGB-D patch descriptor designed for detecting coplanar surfaces in SLAM reconstruction. The core of our method is a deep convolutional neural net that takes in RGB, depth, and normal information of a planar patch in an image and outputs a descriptor that can be used to find coplanar patches from other images.We train the network on 10 million triplets of coplanar and non-coplanar patches, and evaluate on a new coplanarity benchmark created from commodity RGB-D scans. Experiments show that our learned descriptor outperforms alternatives extended for this new task by a significant margin. In addition, we demonstrate the benefits of coplanarity matching in a robust RGBD reconstruction formulation.We find that coplanarity constraints detected with our method are sufficient to get reconstruction results comparable to state-of-the-art frameworks on most scenes, but outperform other methods on standard benchmarks when combined with a simple keypoint method.

[1] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Gary R. Bradski,et al. ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[3] Kuk-Jin Yoon,et al. Joint Layout Estimation and Global Multi-view Registration for Indoor Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4] Leonidas J. Guibas,et al. 3Dlite , 2017, ACM Trans. Graph..

[5] Ben Glocker,et al. Real-Time RGB-D Camera Relocalization via Randomized Ferns for Keyframe Encoding , 2015, IEEE Transactions on Visualization and Computer Graphics.

[6] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Javier Civera,et al. DPPTAM: Dense piecewise planar tracking and mapping from a monocular sequence , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9] Christopher Hunt,et al. Notes on the OpenSURF Library , 2009 .

[10] Matthias Nießner,et al. BundleFusion , 2016, TOGS.

[11] Tim Weyrich,et al. Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[12] Thomas A. Funkhouser,et al. Fine-to-Coarse Global Registration of RGB-D Scans , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Stefan Leutenegger,et al. ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[14] Jörg Stückler,et al. Orthogonal wall correction for visual motion estimation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[15] Daniel Cremers,et al. Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Dieter Fox,et al. RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[17] Yang Gao,et al. Probabilistic Combination of Noisy Points and Planes for RGB-D Odometry , 2017, TAROS.

[18] Guofeng Zhang,et al. Keyframe-based dense planar SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Michael F. Cohen,et al. Emptying, refurnishing, and relighting indoor spaces , 2016, ACM Trans. Graph..

[21] Matthias Nießner,et al. Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[22] Dieter Fox,et al. SE3-nets: Learning rigid body motion using deep neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23] Tobias Pietzsch. Planar Features for Visual SLAM , 2008, KI.

[24] Matthias Nießner,et al. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Michael Milford,et al. Convolutional Neural Network-based Place Recognition , 2014, ICRA 2014.

[26] Eric Brachmann,et al. DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Paul H. J. Kelly,et al. Dense planar SLAM , 2014, 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[28] Wolfram Burgard,et al. A Tutorial on Graph-Based SLAM , 2010, IEEE Intelligent Transportation Systems Magazine.

[29] Roland Siegwart,et al. 3D SLAM using planar segments , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30] Jörg Stückler,et al. CPA-SLAM: Consistent plane-model alignment for direct RGB-D SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[31] Matthias Nießner,et al. Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[32] Andrew Owens,et al. SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[33] Matthias Nießner,et al. 3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[35] Rahul Sukthankar,et al. MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Vladlen Koltun,et al. Fast Global Registration , 2016, ECCV.

[37] Vladlen Koltun,et al. Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Vladlen Koltun,et al. Colored Point Cloud Registration Revisited , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39] Vincent Lepetit,et al. LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[40] Henrik I. Christensen,et al. Planar surface SLAM with 3D and 2D sensors , 2012, 2012 IEEE International Conference on Robotics and Automation.

[41] Daniel Cremers,et al. Volumetric 3D mapping in real-time on a CPU , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[42] Vladlen Koltun,et al. Dense scene reconstruction with points of interest , 2013, ACM Trans. Graph..

[43] Jean-Arcady Meyer,et al. Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[44] Dieter Fox,et al. Self-Supervised Visual Descriptor Learning for Dense Correspondence , 2017, IEEE Robotics and Automation Letters.

[45] Kun Zhou,et al. Online Structure Analysis for Real-Time Indoor Scene Reconstruction , 2015, ACM Trans. Graph..

[46] Chen Feng,et al. Point-plane SLAM for hand-held 3D sensors , 2013, 2013 IEEE International Conference on Robotics and Automation.

[47] Marc Levoy,et al. Geometrically stable sampling for the ICP algorithm , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[48] Jan-Michael Frahm,et al. Exploring High-Level Plane Primitives for Indoor 3D Reconstruction with a Hand-held RGB-D Camera , 2012, ACCV Workshops.

[49] Wolfram Burgard,et al. An evaluation of the RGB-D SLAM system , 2012, 2012 IEEE International Conference on Robotics and Automation.

[50] Thomas A. Funkhouser,et al. Structured Global Registration of RGB-D Scans in Indoor Environments , 2016, ArXiv.

[51] Jiawen Chen,et al. Scalable real-time volumetric surface reconstruction , 2013, ACM Trans. Graph..

[52] Wolfram Burgard,et al. A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[53] Andrew W. Fitzgibbon,et al. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.