Semantic segmentation for 3D localization in urban environments

We show how to use simple 2.5D maps of buildings and recent advances in image segmentation and machine learning to geo-localize an input image of an urban scene: We first extract the façades of the buildings and their edges from the image, and then look for the orientation and location that align a 3D rendering of the map with these segments. We discuss how to use a 3D tracking system to acquire the data required for training the segmentation method, the segmentation itself, and how we use the segmentations to evaluate the quality of the alignment.

[1]  Philip David,et al.  Orientation descriptors for localization in urban environments , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Marc Pollefeys,et al.  Large Scale Visual Geo-Localization of Images in Mountainous Terrain , 2012, ECCV.

[4]  Liang-Tien Chia,et al.  Estimating camera pose from a single urban ground-view omnidirectional image and a 2D building outline map , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Horst Bischof,et al.  BUILDING FAÇADE SEPARATION IN VERTICAL AERIAL IMAGES , 2012 .

[8]  Vincent Lepetit,et al.  Instant Outdoor Localization and SLAM Initialization from 2.5D Maps , 2015, IEEE Transactions on Visualization and Computer Graphics.

[9]  Marc Pollefeys,et al.  Registration of Spherical Panoramic Images with Cadastral 3D Models , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[10]  Tsuhan Chen,et al.  GPS Refinement and Camera Orientation Estimation from a Single Image and a 2D Map , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Peter V. Gehler,et al.  Efficient Facade Segmentation Using Auto-context , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[13]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Horst Bischof,et al.  Unsupervised Facade Segmentation Using Repetitive Patterns , 2010, DAGM-Symposium.

[15]  Amir Roshan Zamir,et al.  City scale geo-spatial trajectory estimation of a moving camera , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[17]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[18]  Peter F. Sturm,et al.  Pose estimation using both points and lines for geo-localization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Mayank Bansal,et al.  Geometric Urban Geo-localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Marc Pollefeys,et al.  Leveraging Topographic Maps for Image to Terrain Alignment , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.