Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization

We present an efficient method for geolocalization in urban environments starting from a coarse estimate of the location provided by a GPS and using a simple untextured 2.5D model of the surrounding buildings. Our key contribution is a novel efficient and robust method to optimize the pose: We train a Deep Network to predict the best direction to improve a pose estimate, given a semantic segmentation of the input image and a rendering of the buildings from this estimate. We then iteratively apply this CNN until converging to a good pose. This approach avoids the use of reference images of the surroundings, which are difficult to acquire and match, while 2.5D models are broadly available. We can therefore apply it to places unseen during training.

[1]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[2]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[3]  Jan Dirk Wegner,et al.  Large-Scale Semantic 3D Reconstruction: An Adaptive Multi-resolution Model for Multi-class Volumetric Labeling , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Marc Pollefeys,et al.  Leveraging Topographic Maps for Image to Terrain Alignment , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[5]  Vincent Lepetit,et al.  Instant Outdoor Localization and SLAM Initialization from 2.5D Maps , 2015, IEEE Transactions on Visualization and Computer Graphics.

[6]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[7]  Marc Pollefeys,et al.  Efficient Structured Parsing of Facades Using Dynamic Programming , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Peter V. Gehler,et al.  Efficient Facade Segmentation Using Auto-context , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[9]  Marc Pollefeys,et al.  Registration of Spherical Panoramic Images with Cadastral 3D Models , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[10]  Patrick Pérez,et al.  Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Liang-Tien Chia,et al.  Estimating camera pose from a single urban ground-view omnidirectional image and a 2D building outline map , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Hayko Riemenschneider,et al.  Irregular lattices for complex shape grammar facade parsing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Horst Bischof,et al.  BUILDING FAÇADE SEPARATION IN VERTICAL AERIAL IMAGES , 2012 .

[14]  Nikos Paragios,et al.  Learning Grammars for Architecture-Specific Facade Parsing , 2016, International Journal of Computer Vision.

[15]  Luc Van Gool,et al.  Hierarchical Co-Segmentation of Building Facades , 2014, 2014 2nd International Conference on 3D Vision.

[16]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Steven K. Feiner,et al.  A touring machine: Prototyping 3D mobile augmented reality systems for exploring the urban environment , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[18]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Peter F. Sturm,et al.  Pose estimation using both points and lines for geo-localization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[20]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Horst Bischof,et al.  Unsupervised Facade Segmentation Using Repetitive Patterns , 2010, DAGM-Symposium.

[23]  Amir Roshan Zamir,et al.  City scale geo-spatial trajectory estimation of a moving camera , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Nikos Paragios,et al.  Segmentation of building facades using procedural shape priors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Torsten Sattler,et al.  Improving Image-Based Localization by Active Correspondence Search , 2012, ECCV.

[26]  N. Meierhold,et al.  REFERENCING OF IMAGES TO LASER SCANNER DATA USING LINEAR FEATURES EXTRACTED FROM DIGITAL IMAGES AND RANGE IMAGES , 2009 .

[27]  Philip David,et al.  Orientation descriptors for localization in urban environments , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Tsuhan Chen,et al.  GPS Refinement and Camera Orientation Estimation from a Single Image and a 2D Map , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.