论文信息 - Learned Contextual Feature Reweighting for Image Geo-Localization

Learned Contextual Feature Reweighting for Image Geo-Localization

We address the problem of large scale image geo-localization where the location of an image is estimated by identifying geo-tagged reference images depicting the same place. We propose a novel model for learning image representations that integrates context-aware feature reweighting in order to effectively focus on regions that positively contribute to geo-localization. In particular, we introduce a Contextual Reweighting Network (CRN) that predicts the importance of each region in the feature map based on the image context. Our model is learned end-to-end for the image geo-localization task, and requires no annotation other than image geo-tags for training. In experimental results, the proposed approach significantly outperforms the previous state-of-the-art on the standard geo-localization benchmark datasets. We also demonstrate that our CRN discovers task-relevant contexts without any additional supervision.

[1] Jan-Michael Frahm,et al. Predicting Good Features for Image Geo-Localization Using Per-Bundle VLAD , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2] Luc Van Gool,et al. Query Adaptive Similarity for Large Scale Object Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Jitendra Malik,et al. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] A. Torralba,et al. The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[5] Andrea Vedaldi,et al. Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6] Jan-Michael Frahm,et al. Personal Photograph Enhancement Using Internet Photo Collections , 2014, IEEE Transactions on Visualization and Computer Graphics.

[7] Josef Sivic,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Mubarak Shah,et al. GPS-Tag Refinement Using Random Walks with an Adaptive Damping Factor , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Torsten Sattler,et al. Image Retrieval for Image-Based Localization Revisited , 2012, BMVC.

[10] Albert Gordo,et al. End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[11] Daniel P. Huttenlocher,et al. Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[12] Ivan Laptev,et al. ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization , 2016, ECCV.

[13] Simon Osindero,et al. Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[14] In-So Kweon,et al. AttentionNet: Aggregating Weak Directions for Accurate Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15] Hervé Jégou,et al. Visual query expansion with or without geometry: Refining local descriptors by feature aggregation , 2014, Pattern Recognit..

[16] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[17] Andrew Zisserman,et al. Visual Vocabulary with a Semantic Twist , 2014, ACCV.

[18] Mubarak Shah,et al. Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[19] Marc Pollefeys,et al. Image Based Geo-localization in the Alps , 2016, International Journal of Computer Vision.

[20] Andrew Zisserman,et al. DisLocation: Scalable Descriptor Distinctiveness for Location Recognition , 2014, ACCV.

[21] Tomás Pajdla,et al. Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[22] Cordelia Schmid,et al. Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[23] Hugo Larochelle,et al. Dynamic Capacity Networks , 2015, ICML.

[24] Ondrej Chum,et al. CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[25] Serge J. Belongie,et al. Learning deep representations for ground-to-aerial geolocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Liqing Zhang,et al. Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[28] Andrew Zisserman,et al. All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Marc Pollefeys,et al. Large Scale Visual Geo-Localization of Images in Mountainous Terrain , 2012, ECCV.

[30] Torsten Sattler,et al. Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Marc Pollefeys,et al. Never Get Lost Again: Vision Based Navigation Using StreetView Images , 2014, ACCV.

[33] Pascal Fua,et al. Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[34] Noah Snavely,et al. Graph-Based Discriminative Learning for Location Recognition , 2013, International Journal of Computer Vision.

[35] Andrew Owens,et al. Discrete-continuous optimization for large-scale structure from motion , 2011, CVPR 2011.

[36] Yang Song,et al. Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Richard Szeliski,et al. City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Kavita Bala,et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Bernhard P. Wrobel,et al. Multiple View Geometry in Computer Vision , 2001 .

[40] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[41] Torsten Sattler,et al. Scalable 6-DOF Localization on Mobile Devices , 2014, ECCV.

[42] Shin'ichi Satoh,et al. Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval , 2013, 2013 IEEE International Conference on Computer Vision.

[43] Jan-Michael Frahm,et al. From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[44] Alexei A. Efros,et al. Large-Scale Image Geolocalization , 2015, Multimodal Location Estimation of Videos and Images.

[45] Ilya Kostrikov,et al. PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[46] Xin Chen,et al. City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[47] Alexei A. Efros,et al. Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[48] Torsten Sattler,et al. Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[49] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[50] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[51] Tomás Pajdla,et al. Learning and Calibrating Per-Location Classifiers for Visual Place Recognition , 2013, International Journal of Computer Vision.

[52] Mubarak Shah,et al. Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53] Abhinav Gupta,et al. Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[56] Masatoshi Okutomi,et al. 24/7 Place Recognition by View Synthesis , 2015, CVPR.

[57] Masatoshi Okutomi,et al. Visual Place Recognition with Repetitive Structures , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[59] Patrick Gros,et al. Asymmetric hamming embedding: taking the best of our bits for large scale image search , 2011, ACM Multimedia.

[60] C. Schmid,et al. On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[61] Michael F. Cohen,et al. Real-time image-based 6-DOF localization in large-scale environments , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[62] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Jana Kosecka,et al. Semantically guided location recognition for outdoors scenes , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[64] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[65] Yannis Avrithis,et al. Image Search with Selective Match Kernels: Aggregation Across Single and Multiple Images , 2016, International Journal of Computer Vision.

[66] Atsuto Maki,et al. From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[67] Ronan Sicre,et al. Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[68] Panu Turcot,et al. Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[69] Feng Wu,et al. 3D visual phrases for landmark recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.