Wide-Area Image Geolocalization with Aerial Reference Imagery

We propose to use deep convolutional neural networks to address the problem of cross-view image geolocalization, in which the geolocation of a ground-level query image is estimated by matching to georeferenced aerial images. We use state-of-the-art feature representations for ground-level images and introduce a cross-view training approach for learning a joint semantic feature representation for aerial images. We also propose a network architecture that fuses features extracted from aerial images at multiple spatial scales. To support training these networks, we introduce a massive database that contains pairs of aerial and ground-level images from across the United States. Our methods significantly out-perform the state of the art on two benchmark datasets. We also show, qualitatively, that the proposed feature representations are discriminative at both local and continental spatial scales.

[1]  Daniel Huber,et al.  Vision based robot localization by ground to satellite matching in GPS-denied situations , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Xiaochun Cao,et al.  Geo-location estimation from two shadow trajectories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Robert Pless,et al.  Geolocating Static Cameras , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Fabio Gagliardi Cozman,et al.  Robot localization using a computer vision sextant , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[6]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[7]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Hassan Foroosh,et al.  GPS coordinates estimation and camera calibration from solar shadows , 2010, Comput. Vis. Image Underst..

[9]  Hui Wu,et al.  Exploring the geo-dependence of human face appearance , 2014, IEEE Winter Conference on Applications of Computer Vision.

[10]  Steven M. Seitz,et al.  Accurate Geo-Registration by Ground-to-Aerial Image Matching , 2014, 2014 2nd International Conference on 3D Vision.

[11]  Alexei A. Efros,et al.  What Do the Sun and the Sky Tell Us About the Camera? , 2010, International Journal of Computer Vision.

[12]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[13]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.

[14]  Henriette Cramer,et al.  Aesthetic capital: what makes london look beautiful, quiet, and happy? , 2014, CSCW.

[15]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[18]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Robert Pless,et al.  Toward Fully Automatic Geo-Location and Geo-Orientation of Static Outdoor Cameras , 2008, 2008 IEEE Workshop on Applications of Computer Vision.

[20]  Frode Eika Sandnes Determining the Geographical Location of Image Scenes based on Object Shadow Lengths , 2011, J. Signal Process. Syst..

[21]  Jiebo Luo,et al.  Event recognition: viewing the world with a third eye , 2008, ACM Multimedia.

[22]  Hui Cheng,et al.  Geo-localization of street views with aerial image databases , 2011, ACM Multimedia.

[23]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Masatoshi Okutomi,et al.  Visual Place Recognition with Repetitive Structures , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[27]  Serge J. Belongie,et al.  Learning deep representations for ground-to-aerial geolocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Marc Pollefeys,et al.  Large Scale Visual Geo-Localization of Images in Mountainous Terrain , 2012, ECCV.

[29]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[30]  Wojciech Matusik,et al.  What do color changes reveal about an outdoor scene? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Scott Workman,et al.  On the location dependence of convolutional neural network features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Scott Workman,et al.  A Pot of Gold: Rainbows as a Calibration Cue , 2014, ECCV.

[33]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Bolei Zhou,et al.  Recognizing City Identity via Attribute Analysis of Geo-tagged Images , 2014, ECCV.

[35]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[36]  Changsheng Xu,et al.  Discovering Geo-Informative Attributes for Location Recognition and Exploration , 2014, TOMM.

[37]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[38]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[39]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[40]  Robert Pless,et al.  Webcam geo-localization using aggregate light levels , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[41]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[42]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[43]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[45]  Stefan Lee,et al.  Predicting Geo-informative Attributes in Large-Scale Image Collections Using Convolutional Neural Networks , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.