Purely Geometric Scene Association and Retrieval - A Case for Macro Scale 3D Geometry

We address the problems of measuring geometric similarity between 3D scenes, represented through point clouds or range data frames, and associating them. Our approach leverages macro-scale 3D structural geometry - the relative configuration of arbitrary surfaces and relationships among structures that are potentially far apart. We express such discriminative information in a viewpoint-invariant feature space. These are subsequently encoded in a frame-level signature that can be utilized to measure geometric similarity. Such a characterization is robust to noise, incomplete and partially overlapping data besides viewpoint changes. We show how it can be employed to select a diverse set of data frames which have structurally similar content, and how to validate whether views with similar geometric content are from the same scene. The problem is formulated as one of general purpose retrieval from an unannotated, spatio-temporally unordered database. Empirical analysis indicates that the presented approach thoroughly outperforms baselines on depth / range data. Its depth-only performance is competitive with state-of-the-art approaches with RGB or RGB-D inputs, including ones based on deep learning. Experiments show retrieval performance to hold up well with much sparser databases, which is indicative of the approach's robustness. The approach generalized well - it did not require dataset specific training, and scaled up in our experiments. Finally, we also demonstrate how geometrically diverse selection of views can result in richer 3D reconstructions.

[1]  Paul Newman,et al.  FAB-MAP 3D: Topological mapping with spatial and visual appearance , 2010, 2010 IEEE International Conference on Robotics and Automation.

[2]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Michael Bosse,et al.  Place recognition using keypoint voting in large 3D lidar datasets , 2013, 2013 IEEE International Conference on Robotics and Automation.

[4]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[5]  Martial Hebert,et al.  3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[10]  Javier González,et al.  Fast place recognition with plane-based maps , 2013, 2013 IEEE International Conference on Robotics and Automation.

[11]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[12]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  François Michaud,et al.  Online global loop closure detection for large-scale multi-session graph-based SLAM , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[16]  Mohammed Bennamoun,et al.  A Comprehensive Performance Evaluation of 3D Local Feature Descriptors , 2015, International Journal of Computer Vision.

[17]  Ben Glocker,et al.  Real-time RGB-D camera relocalization , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[18]  Edwin Olson,et al.  Recognizing places using spectrally clustered local matches , 2009, Robotics Auton. Syst..

[19]  Henrik I. Christensen,et al.  GASP: Geometric Association with Surface Patches , 2014, 2014 2nd International Conference on 3D Vision.

[20]  Andrew Calway,et al.  RGBD relocalisation using pairwise geometry and concise key point sets , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Andrew W. Fitzgibbon,et al.  Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Andreas Nüchter,et al.  Automatic appearance‐based loop detection from three‐dimensional laser data using the normal distributions transform , 2009, J. Field Robotics.

[24]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ali Shahrokni,et al.  Mesh Based Semantic Modelling for Indoor and Outdoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[28]  Bin Fang,et al.  A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries , 2015, Comput. Vis. Image Underst..

[29]  Roberto Cipolla,et al.  SceneNet: Understanding Real World Indoor Scenes With Synthetic Data , 2015, ArXiv.

[30]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[31]  Thomas B. Schön,et al.  Learning to close loops from range data , 2011, Int. J. Robotics Res..

[32]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[33]  Florentin Wörgötter,et al.  Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Radu Horaud,et al.  A Generative Model for the Joint Registration of Multiple Point Sets , 2014, ECCV.

[35]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Mohammed Bennamoun,et al.  3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Wolfram Burgard,et al.  Place recognition in 3D scans using a combination of bag of words and point feature based relative pose estimation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[38]  Damir Filko,et al.  Place recognition based on matching of planar surfaces and line segments , 2015, Int. J. Robotics Res..

[39]  P. Tichavský,et al.  Efficient variant of algorithm fastica for independent component analysis attaining the cramer-RAO lower bound , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[40]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[41]  Vladlen Koltun,et al.  Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Andrew W. Fitzgibbon,et al.  Multi-output Learning for Camera Relocalization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[45]  Marian Himstedt,et al.  Geometry matters: Place recognition in 2D range scans using Geometrical Surface Relations , 2015, 2015 European Conference on Mobile Robots (ECMR).

[46]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.