Learning Camera Localization via Dense Scene Matching

Camera localization aims to estimate 6 DoF camera poses from RGB images. Traditional methods detect and match interest points between a query image and a prebuilt 3D model. Recent learning-based approaches encode scene structures into a specific convolutional neural network (CNN) and thus are able to predict dense coordinates from RGB images. However, most of them require re-training or re-adaption for a new scene and have difficulties in handling large-scale scenes due to limited network capacity. We present a new method for scene agnostic camera localization using dense scene matching (DSM), where a cost volume is constructed between a query image and a scene. The cost volume and the corresponding coordinates are processed by a CNN to predict dense coordinates. Camera poses can then be solved by PnP algorithms. In addition, our method can be extended to temporal domain, which leads to extra performance boost during testing time. Our scene-agnostic approach achieves comparable accuracy as the existing scene-specific approaches, such as KFNet, on the 7scenes and Cambridge benchmark. This approach also remarkably outperforms state-of-the-art scene-agnostic dense coordinate regression network SANet. The Code is available at https://github.com/Tangshitao/DenseScene-Matching.

[1]  Torsten Sattler,et al.  Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Torsten Sattler,et al.  Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Eric Brachmann,et al.  Expert Sample Consensus Applied to Camera Re-Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Qiong Yan,et al.  Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[8]  Joel A. Hesch,et al.  A Direct Least-Squares (DLS) method for PnP , 2011, 2011 International Conference on Computer Vision.

[9]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[10]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[12]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[13]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Szymon Rusinkiewicz,et al.  Learning to Detect Features in Texture Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Detection and Description of Local Features , 2019, CVPR 2019.

[16]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[17]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jan Kautz,et al.  Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[21]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[22]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[25]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Roland Siegwart,et al.  Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization , 2018, CoRL.

[28]  Torsten Sattler,et al.  Scalable 6-DOF Localization on Mobile Devices , 2014, ECCV.

[29]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[30]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[34]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Vincent Lepetit,et al.  Learning to Find Good Correspondences , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Tomasz Malisiewicz,et al.  SuperGlue: Learning Feature Matching With Graph Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[40]  Long Quan,et al.  KFNet: Learning Temporal Camera Relocalization Using Kalman Filtering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Ping Tan,et al.  SANet: Scene Agnostic Network for Camera Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Torsten Sattler,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Torsten Sattler,et al.  Improving Image-Based Localization by Active Correspondence Search , 2012, ECCV.

[44]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[47]  Torsten Sattler,et al.  Evaluating Local Features for Day-Night Matching , 2016, ECCV Workshops.

[48]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Juho Kannala,et al.  Hierarchical Scene Coordinate Classification and Regression for Visual Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[51]  Weisi Lin,et al.  Cascaded Parallel Filtering for Memory-Efficient Image-Based Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Jon Almazán,et al.  Learning With Average Precision: Training Image Retrieval With a Listwise Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[54]  Ben Glocker,et al.  Real-time RGB-D camera relocalization , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[55]  Roland Siegwart,et al.  From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).