Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching

Cross-view geo-localization is the problem of estimating the position and orientation (latitude, longitude and azimuth angle) of a camera at ground level given a large-scale database of geo-tagged aerial (eg., satellite) images. Existing approaches treat the task as a pure location estimation problem by learning discriminative feature descriptors, but neglect orientation alignment. It is well-recognized that knowing the orientation between ground and aerial images can significantly reduce matching ambiguity between these two views, especially when the ground-level images have a limited Field of View (FoV) instead of a full field-of-view panorama. Therefore, we design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization. In particular, we address the cross-view domain gap by applying a polar transform to the aerial images to approximately align the images up to an unknown azimuth angle. Then, a two-stream convolutional network is used to learn deep features from the ground and polar-transformed aerial images. Finally, we obtain the orientation by computing the correlation between cross-view features, which also provides a more accurate measure of feature similarity, improving location recall. Experiments on standard datasets demonstrate that our method significantly improves state-of-the-art performance. Remarkably, we improve the top-1 location recall rate on the CVUSA dataset by a factor of 1.5x for panoramas with known orientation, by a factor of 3.3x for panoramas with unknown orientation, and by a factor of 6x for 180-degree FoV images with unknown orientation.

[1]  Silvio Savarese,et al.  Semantic Cross-View Matching , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[2]  Bin Sun,et al.  GeoCapsNet: Aerial to Ground view Image Geo-localization using Capsule Network , 2019, ArXiv.

[3]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Scott Workman,et al.  Wide-Area Image Geolocalization with Aerial Reference Imagery , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Hongdong Li,et al.  Lending Orientation to Neural Networks for Cross-View Geo-Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[9]  Xin Yu,et al.  Spatial-Aware Feature Aggregation for Image based Cross-View Geo-Localization , 2019, NeurIPS.

[10]  Hongdong Li,et al.  Optimal Feature Transport for Cross-View Image Geo-Localization , 2019, AAAI.

[11]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Scott Workman,et al.  Predicting Ground-Level Scene Layout from Aerial Imagery , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Salman Khan,et al.  Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  James Hays,et al.  Localizing and Orienting Street Views Using Overhead Imagery , 2016, ECCV.

[17]  Gim Hee Lee,et al.  CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Scott Workman,et al.  On the location dependence of convolutional neural network features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Jana Kosecka,et al.  Semantic Image Based Geolocation Given a Map , 2016, ArXiv.

[20]  Mubarak Shah,et al.  Bridging the Domain Gap for Ground-to-Aerial Image Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Andrea Vedaldi,et al.  Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.