GeoCapsNet: Aerial to Ground view Image Geo-localization using Capsule Network

The task of cross-view image geo-localization aims to determine the geo-location (GPS coordinates) of a query ground-view image by matching it with the GPS-tagged aerial (satellite) images in a reference dataset. Due to the dramatic changes of viewpoint, matching the cross-view images is challenging. In this paper, we propose the GeoCapsNet based on the capsule network for ground-to-aerial image geo-localization. The network first extracts features from both ground-view and aerial images via standard convolution layers and the capsule layers further encode the features to model the spatial feature hierarchies and enhance the representation power. Moreover, we introduce a simple and effective weighted soft-margin triplet loss with online batch hard sample mining, which can greatly improve image retrieval accuracy. Experimental results show that our GeoCapsNet significantly outperforms the state-of-the-art approaches on two benchmark datasets.

[1]  James Hays,et al.  Localizing and Orienting Street Views Using Overhead Imagery , 2016, ECCV.

[2]  Mubarak Shah,et al.  Cross-View Image Matching for Geo-Localization in Urban Environments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Serge J. Belongie,et al.  Learning deep representations for ground-to-aerial geolocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jan Flusser,et al.  Image registration methods: a survey , 2003, Image Vis. Comput..

[7]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Roland Siegwart,et al.  Robust Visual Place Recognition with Graph Kernels , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Gim Hee Lee,et al.  CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Mayank Bansal,et al.  Ultra-wide Baseline Facade Matching for Geo-localization , 2012, ECCV Workshops.

[13]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Scott Workman,et al.  On the location dependence of convolutional neural network features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Scott Workman,et al.  Wide-Area Image Geolocalization with Aerial Reference Imagery , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Tomás Pajdla,et al.  Visual localization by linear combination of image descriptors , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[17]  Mubarak Shah,et al.  Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Gang Wang,et al.  Gated Siamese Convolutional Neural Network Architecture for Human Re-identification , 2016, ECCV.

[21]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Scott Workman,et al.  Predicting Ground-Level Scene Layout from Aerial Imagery , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Steven M. Seitz,et al.  Accurate Geo-Registration by Ground-to-Aerial Image Matching , 2014, 2014 2nd International Conference on 3D Vision.

[24]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[25]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, CVPR.