3D Localization in Urban Environments from Single Images

In this paper, we tackle the problem of geolocalization in urban environments overcoming the limitations in terms of accuracy of sensors like GPS, compass and accelerometer. For that purpose, we adopt recent findings in image segmentation and machine learning and combine them with the valuable information given by 2.5D maps of buildings. In particular, we first extract the façades of buildings and their edges and use this information to estimate the orientation and location that best align an input image to a 3D rendering of the given 2.5D map. As this step builds on a learned semantic segmentation procedure, rich training data is required. Thus, we also discuss how the required training data can be efficiently generated via a 3D tracking system.

[1]  Vincent Lepetit,et al.  Semantic segmentation for 3D localization in urban environments , 2017, 2017 Joint Urban Remote Sensing Event (JURSE).

[2]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Vincent Lepetit,et al.  Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Vincent Lepetit,et al.  Instant Outdoor Localization and SLAM Initialization from 2.5D Maps , 2015, IEEE Transactions on Visualization and Computer Graphics.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.