论文信息 - Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching

Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching

Cross-view geo-localization is the problem of estimating the position and orientation (latitude, longitude and azimuth angle) of a camera at ground level given a large-scale database of geo-tagged aerial (eg., satellite) images. Existing approaches treat the task as a pure location estimation problem by learning discriminative feature descriptors, but neglect orientation alignment. It is well-recognized that knowing the orientation between ground and aerial images can significantly reduce matching ambiguity between these two views, especially when the ground-level images have a limited Field of View (FoV) instead of a full field-of-view panorama. Therefore, we design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization. In particular, we address the cross-view domain gap by applying a polar transform to the aerial images to approximately align the images up to an unknown azimuth angle. Then, a two-stream convolutional network is used to learn deep features from the ground and polar-transformed aerial images. Finally, we obtain the orientation by computing the correlation between cross-view features, which also provides a more accurate measure of feature similarity, improving location recall. Experiments on standard datasets demonstrate that our method significantly improves state-of-the-art performance. Remarkably, we improve the top-1 location recall rate on the CVUSA dataset by a factor of 1.5x for panoramas with known orientation, by a factor of 3.3x for panoramas with unknown orientation, and by a factor of 6x for 180-degree FoV images with unknown orientation.

[1] Silvio Savarese,et al. Semantic Cross-View Matching , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[2] Bin Sun,et al. GeoCapsNet: Aerial to Ground view Image Geo-localization using Capsule Network , 2019, ArXiv.

[3] Serge J. Belongie,et al. Cross-View Image Geolocalization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6] Scott Workman,et al. Wide-Area Image Geolocalization with Aerial Reference Imagery , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7] Hongdong Li,et al. Lending Orientation to Neural Networks for Cross-View Geo-Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[9] Xin Yu,et al. Spatial-Aware Feature Aggregation for Image based Cross-View Geo-Localization , 2019, NeurIPS.

[10] Hongdong Li,et al. Optimal Feature Transport for Cross-View Image Geo-Localization , 2019, AAAI.

[11] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14] Scott Workman,et al. Predicting Ground-Level Scene Layout from Aerial Imagery , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Salman Khan,et al. Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16] James Hays,et al. Localizing and Orienting Street Views Using Overhead Imagery , 2016, ECCV.

[17] Gim Hee Lee,et al. CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Scott Workman,et al. On the location dependence of convolutional neural network features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19] Jana Kosecka,et al. Semantic Image Based Geolocation Given a Map , 2016, ArXiv.

[20] Mubarak Shah,et al. Bridging the Domain Gap for Ground-to-Aerial Image Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Andrea Vedaldi,et al. Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[22] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.