论文信息 - Learning deep representations for ground-to-aerial geolocalization

Learning deep representations for ground-to-aerial geolocalization

The recent availability of geo-tagged images and rich geospatial data has inspired a number of algorithms for image based geolocalization. Most approaches predict the location of a query image by matching to ground-level images with known locations (e.g., street-view data). However, most of the Earth does not have ground-level reference photos available. Fortunately, more complete coverage is provided by oblique aerial or “bird's eye” imagery. In this work, we localize a ground-level query image by matching it to a reference database of aerial imagery. We use publicly available data to build a dataset of 78K aligned crossview image pairs. The primary challenge for this task is that traditional computer vision approaches cannot handle the wide baseline and appearance variation of these cross-view pairs. We use our dataset to learn a feature representation in which matching views are near one another and mismatched views are far apart. Our proposed approach, Where-CNN, is inspired by deep learning success in face verification and achieves significant improvements over traditional hand-crafted features and existing deep features learned from other large-scale databases. We show the effectiveness of Where-CNN in finding matches between street view and aerial view imagery and demonstrate the ability of our learned features to generalize to novel locations.

[1] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3] Serge J. Belongie,et al. Cross-View Image Geolocalization , 2013, CVPR.

[4] Jian Sun,et al. Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Marwan Mattar,et al. Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[6] Christian Früh,et al. Google Street View: Capturing the World at Street Level , 2010, Computer.

[7] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[8] Marc Pollefeys,et al. Large Scale Visual Geo-Localization of Images in Mountainous Terrain , 2012, ECCV.

[9] Yang Song,et al. Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Daniel P. Huttenlocher,et al. Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[11] Xin Chen,et al. City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[12] Mayank Bansal,et al. Geometric Urban Geo-localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[14] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15] Steven M. Seitz,et al. Accurate Geo-Registration by Ground-to-Aerial Image Matching , 2014, 2014 2nd International Conference on 3D Vision.

[16] Serge J. Belongie,et al. Cross-View Image Geolocalization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Mark Goadrich,et al. The relationship between Precision-Recall and ROC curves , 2006, ICML.

[18] Alexei A. Efros,et al. IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[21] Peter N. Belhumeur,et al. Tom-vs-Pete Classifiers and Identity-Preserving Alignment for Face Verification , 2012, BMVC.

[22] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[23] Mubarak Shah,et al. Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[24] Peter N. Belhumeur,et al. POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[28] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[29] Mayank Bansal,et al. Ultra-wide Baseline Facade Matching for Geo-localization , 2012, ECCV Workshops.

[30] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.