VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval

Cross-view image geo-localization aims to determine the locations of street-view query images by matching with GPS-tagged reference images from aerial view. Recent works have achieved surprisingly high retrieval accuracy on city-scale datasets. However, these results rely on the assumption that there exists a reference image exactly centered at the location of any query image, which is not applicable for practical scenarios. In this paper, we redefine this problem with a more realistic assumption that the query image can be arbitrary in the area of interest and the reference images are captured before the queries emerge. This assumption breaks the one-to-one retrieval setting of existing datasets as the queries and reference images are not perfectly aligned pairs, and there may be multiple reference images covering one query location. To bridge the gap between this realistic setting and existing datasets, we propose a new large-scale benchmark -- VIGOR -- for cross-View Image Geo-localization beyond One-to-one Retrieval. We benchmark existing state-of-the-art methods and propose a novel end-to-end framework to localize the query in a coarse-to-fine manner. Apart from the image-level retrieval accuracy, we also evaluate the localization accuracy in terms of the actual distance (meters) using the raw GPS data. Extensive experiments are conducted under different application scenarios to validate the effectiveness of the proposed method. The results indicate that cross-view geo-localization in this realistic setting is still challenging, fostering new research in this direction. Our dataset and code will be publicly available.

[1]  James Hays,et al.  Localizing and Orienting Street Views Using Overhead Imagery , 2016, ECCV.

[2]  George Loizou,et al.  Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[3]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Salman Khan,et al.  Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Ser-Nam Lim,et al.  A Metric Learning Reality Check , 2020, ECCV.

[7]  Chen Chen,et al.  GEOCAPSNET: Ground to Aerial View Image Geo-Localization using Capsule Network , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[8]  Scott Workman,et al.  Wide-Area Image Geolocalization with Aerial Reference Imagery , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Mehrdad Farajtabar,et al.  Cross-View Policy Learning for Street Navigation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Chen Chen,et al.  Revisiting Street-to-Aerial View Image Geo-localization and Orientation Estimation , 2020, ArXiv.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  Gim Hee Lee,et al.  CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Scott Workman,et al.  Predicting Ground-Level Scene Layout from Aerial Imagery , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bohyung Han,et al.  Stochastic Class-Based Hard Example Mining for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xin Yu,et al.  Spatial-Aware Feature Aggregation for Image based Cross-View Geo-Localization , 2019, NeurIPS.

[19]  Trevor Darrell,et al.  Accurate Visual Localization for Automotive Applications , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Mubarak Shah,et al.  Cross-View Image Matching for Geo-Localization in Urban Environments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Mubarak Shah,et al.  Bridging the Domain Gap for Ground-to-Aerial Image Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[23]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[24]  Xin Yu,et al.  Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[26]  Simone Milani,et al.  Ground-to-Aerial Viewpoint Localization via Landmark Graphs Matching , 2020, IEEE Signal Processing Letters.

[27]  Hongdong Li,et al.  Lending Orientation to Neural Networks for Cross-View Geo-Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).