Predicting Geo-informative Attributes in Large-Scale Image Collections Using Convolutional Neural Networks

Geographic location is a powerful property for organizing large-scale photo collections, but only a small fraction of online photos are geo-tagged. Most work in automatically estimating geo-tags from image content is based on comparison against models of buildings or landmarks, or on matching to large reference collections of geotagged images. These approaches work well for frequently photographed places like major cities and tourist destinations, but fail for photos taken in sparsely photographed places where few reference photos exist. Here we consider how to recognize general geo-informative attributes of a photo, e.g. the elevation gradient, population density, demographics, etc. of where it was taken, instead of trying to estimate a precise geo-tag. We learn models for these attributes using a large (noisy) set of geo-tagged images from Flickr by training deep convolutional neural networks (CNNs). We evaluate on over a dozen attributes, showing that while automatically recognizing some attributes is very difficult, others can be automatically estimated with about the same accuracy as a human.

[1]  Alexander C. Berg,et al.  Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[2]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Alexei A. Efros,et al.  Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[6]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[7]  Jan-Michael Frahm,et al.  Improved Geometric Verification for Large Scale Landmark Image Collections , 2012, BMVC.

[8]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[10]  Yang Song,et al.  Tour the world: a technical demonstration of a web-scale landmark recognition engine , 2009, ACM Multimedia.

[11]  Kristen Grauman,et al.  Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[12]  Feng Wu,et al.  3D visual phrases for landmark recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Andrew Owens,et al.  Discrete-continuous optimization for large-scale structure from motion , 2011, CVPR.

[14]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[16]  Ashutosh Saxena,et al.  Cascaded Classification Models: Combining Models for Holistic Scene Understanding , 2008, NIPS.

[17]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[18]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Claudia Hauff,et al.  A study on the accuracy of Flickr's geotag data , 2013, SIGIR.

[20]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[21]  David J. Crandall,et al.  Mining photo-sharing websites to study ecological phenomena , 2012, WWW.

[22]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  A. Karnieli,et al.  Comparison of methods for land-use classification incorporating remote sensing and GIS inputs , 2011 .

[25]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[30]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  S. Newsam,et al.  IM2MAP: deriving maps from georeferenced community contributed photo collections , 2011, WSM '11.

[34]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Shawn D. Newsam,et al.  Proximate sensing: Inferring what-is-where from georeferenced photo collections , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[37]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Bolei Zhou,et al.  Recognizing City Identity via Attribute Analysis of Geo-tagged Images , 2014, ECCV.