On the location dependence of convolutional neural network features

As the availability of geotagged imagery has increased, so has the interest in geolocation-related computer vision applications, ranging from wide-area image geolocalization to the extraction of environmental data from social network imagery. Encouraged by the recent success of deep convolutional networks for learning high-level features, we investigate the usefulness of deep learned features for such problems. We compare features extracted from various layers of convolutional neural networks and analyze their discriminative ability with regards to location. Our analysis spans several problem settings, including region identification, visualizing land cover in aerial imagery, and ground-image localization in regions without ground-image reference data (where we achieve state-of-the-art performance on a benchmark dataset). We present results on multiple datasets, including a new dataset we introduce containing hundreds of thousands of ground-level and aerial images in a large region centered around San Francisco.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Robert Pless,et al.  Toward Fully Automatic Geo-Location and Geo-Orientation of Static Outdoor Cameras , 2008, 2008 IEEE Workshop on Applications of Computer Vision.

[4]  Scott Workman,et al.  A Pot of Gold: Rainbows as a Calibration Cue , 2014, ECCV.

[5]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[6]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[7]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Robert Pless,et al.  Webcam geo-localization using aggregate light levels , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[10]  Changsheng Xu,et al.  Discovering Geo-Informative Attributes for Location Recognition and Exploration , 2014, TOMM.

[11]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[12]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[13]  Henriette Cramer,et al.  Aesthetic capital: what makes london look beautiful, quiet, and happy? , 2014, CSCW.

[14]  Thomas Brox,et al.  Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.

[15]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[16]  Robert Pless,et al.  Geolocating Static Cameras , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[18]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[19]  Bolei Zhou,et al.  Recognizing City Identity via Attribute Analysis of Geo-tagged Images , 2014, ECCV.

[20]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Marc Pollefeys,et al.  Large Scale Visual Geo-Localization of Images in Mountainous Terrain , 2012, ECCV.

[22]  Hui Wu,et al.  Exploring the geo-dependence of human face appearance , 2014, IEEE Winter Conference on Applications of Computer Vision.

[23]  Xiaofeng Tao,et al.  Transient attributes for high-level understanding and editing of outdoor scenes , 2014, ACM Trans. Graph..

[24]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[27]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.