Accurate Camera Registration in Urban Environments Using High-Level Feature Matching

We propose a method for accurate camera pose estimation in urban environments from single images and 2D maps made of the surrounding buildings’ outlines. Our approach bridges the gap between learning-based approaches and geometric approaches: We use recent semantic segmentation techniques for extracting the buildings’ edges and the façades’ normals in the images and minimal solvers [14] to compute the camera pose accurately and robustly. We propose two such minimal solvers: one based on three correspondences of buildings’ corners from the image and the 2D map and another one based on two corner correspondences plus one façade correspondence. We show on a challenging dataset that, compared to recent state-of-the-art [1], this approach is both, faster and more accurate.

[1]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[2]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[4]  Tsuhan Chen,et al.  GPS Refinement and Camera Orientation Estimation from a Single Image and a 2D Map , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[5]  Zuzana Kukelova,et al.  A general solution to the P4P problem for camera with unknown focal length , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Patrick Pérez,et al.  Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Peter F. Sturm,et al.  Pose estimation using both points and lines for geo-localization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[8]  Jan Dirk Wegner,et al.  Large-Scale Semantic 3D Reconstruction: An Adaptive Multi-resolution Model for Multi-class Volumetric Labeling , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jianliang Tang,et al.  Complete Solution Classification for the Perspective-Three-Point Problem , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Mayank Bansal,et al.  Geometric Urban Geo-localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Philip David,et al.  Orientation descriptors for localization in urban environments , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Marc Pollefeys,et al.  Large Scale Visual Geo-Localization of Images in Mountainous Terrain , 2012, ECCV.

[14]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Zuzana Kukelova,et al.  Real-Time Solution to the Absolute Pose Problem with Unknown Radial Distortion and Focal Length , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Michel Dhome,et al.  Determination of the Attitude of 3D Objects from a Single Perspective View , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[18]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[19]  Robert M. Haralick,et al.  Review and analysis of solutions of the three point perspective pose estimation problem , 1994, International Journal of Computer Vision.

[20]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[21]  Liang-Tien Chia,et al.  Estimating camera pose from a single urban ground-view omnidirectional image and a 2D building outline map , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Vincent Lepetit,et al.  Instant Outdoor Localization and SLAM Initialization from 2.5D Maps , 2015, IEEE Transactions on Visualization and Computer Graphics.

[23]  Marc Pollefeys,et al.  Registration of Spherical Panoramic Images with Cadastral 3D Models , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[24]  Amir Roshan Zamir,et al.  City scale geo-spatial trajectory estimation of a moving camera , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Vincent Lepetit,et al.  Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Zuzana Kukelova,et al.  Automatic Generator of Minimal Problem Solvers , 2008, ECCV.