CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps

Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information. This task is inherently challenging since many photos have only few, possibly ambiguous cues to their geolocation. Recent work has cast this task as a classification problem by partitioning the earth into a set of discrete cells that correspond to geographic regions. The granularity of this partitioning presents a critical trade-off; using fewer but larger cells results in lower location accuracy while using more but smaller cells reduces the number of training examples per class and increases model size, making the model prone to overfitting. To tackle this issue, we propose a simple but effective algorithm, combinatorial partitioning, which generates a large number of fine-grained output classes by intersecting multiple coarse-grained partitionings of the earth. Each classifier votes for the fine-grained classes that overlap with their respective coarse-grained ones. This technique allows us to predict locations at a fine scale while maintaining sufficient training examples per class. Our algorithm achieves the state-of-the-art performance in location recognition on multiple benchmark datasets.

[1]  Hongdong Li,et al.  Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Serge J. Belongie,et al.  Learning deep representations for ground-to-aerial geolocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Nathan Jacobs,et al.  Revisiting IM2GPS in the Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Luc Van Gool,et al.  I know what you did last summer: object-level auto-annotation of holiday snaps , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[10]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[11]  Marc Pollefeys,et al.  Handling Urban Location Recognition as a 2D Homothetic Problem , 2010, ECCV.

[12]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Scott Workman,et al.  Wide-Area Image Geolocalization with Aerial Reference Imagery , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Tomás Pajdla,et al.  Learning and Calibrating Per-Location Classifiers for Visual Place Recognition , 2013, CVPR.

[15]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jan-Michael Frahm,et al.  Predicting Good Features for Image Geo-Localization Using Per-Bundle VLAD , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Torsten Sattler,et al.  Image Retrieval for Image-Based Localization Revisited , 2012, BMVC.

[20]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[21]  Lorenzo Torresani,et al.  Leveraging Structure from Motion to Learn Discriminative Codebooks for Scalable Landmark Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Guang-Zhong Yang,et al.  From images to scenes: Compressing an image cluster into a single scene model for place recognition , 2011, 2011 International Conference on Computer Vision.

[23]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Bastian Leibe,et al.  Visual landmark recognition from Internet photo collections: A large-scale evaluation , 2014, Comput. Vis. Image Underst..

[25]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Torsten Sattler,et al.  Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[28]  Mubarak Shah,et al.  Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Yannis Avrithis,et al.  Retrieving landmark and non-landmark images from community photo collections , 2010, ACM Multimedia.

[31]  Mubarak Shah,et al.  Cross-View Image Matching for Geo-Localization in Urban Environments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[37]  Alexei A. Efros,et al.  Large-Scale Image Geolocalization , 2015, Multimodal Location Estimation of Videos and Images.

[38]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[40]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[41]  Jan-Michael Frahm,et al.  Learned Contextual Feature Reweighting for Image Geo-Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[43]  Noah Snavely,et al.  Graph-Based Discriminative Learning for Location Recognition , 2013, International Journal of Computer Vision.