A Data-Centric Approach for Image Scene Localization

Due to the ubiquity of GPS-equipped cameras such as smartphones, more photos are getting automatically tagged with camera locations (referred to as geo-tagged images) so large-scale geo-tagged image datasets are available on the Web. And a significant portion of online images such as travel and surveillance may not be meaningful without their location information. Thus, image localization for untagged images has been studied. However, the point camera location of an image might be quite different from the location of the scene depicted in the image (referred to as scene location) rendering image localization inaccurate. To address this problem, we propose a data-centric framework for image scene localization using a CNN-based classification in three steps. First, the framework provides two mechanisms for constructing a reference image dataset tagged with scene locations. Second, a spatial-visual classification approach organizes a dataset spatially using R-tree to generate a set of geographical regions tightly bounding the image scene locations. Then, we train a classifier based on the classes of images corresponding to the generated regions. Finally, to enhance the classification accuracy, we train a set of hierarchical classification models utilizing the spatial hierarchical structure of the R-tree where the trained models enable learning the visual features of images at different geographical granularities. We evaluate our framework using a geo-tagged image dataset obtained from Google Street View and demonstrate that the utilization of scene locations enables localizing images far more accurately as compared with camera location based localization.

[1]  Cyrus Shahabi,et al.  Efficient indexing and retrieval of large-scale geo-tagged video databases , 2016, GeoInformatica.

[2]  Cyrus Shahabi,et al.  Spatial Coverage Measurement of Geo- Tagged Visual Data: A Database Approach , 2018, 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM).

[3]  Cyrus Shahabi,et al.  Image Classification to Determine the Level of Street Cleanliness: A Case Study , 2018, 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM).

[4]  Luming Zhang,et al.  Active key frame selection for 3D model reconstruction from crowdsourced geo-tagged videos , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[5]  Yannis Avrithis,et al.  Retrieving landmark and non-landmark images from community photo collections , 2010, ACM Multimedia.

[6]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[7]  Jiebo Luo,et al.  Estimating the camera direction of a geotagged image using reference images , 2014, Pattern Recognit..

[8]  Cyrus Shahabi,et al.  Hybrid Indexes for Spatial-Visual Search , 2017, ACM Multimedia.

[9]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[10]  Cyrus Shahabi,et al.  Scalable Spatial Crowdsourcing: A Study of Distributed Algorithms , 2015, 2015 16th IEEE International Conference on Mobile Data Management.

[11]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Cyrus Shahabi,et al.  GeoCrowd: enabling query answering with spatial crowdsourcing , 2012, SIGSPATIAL/GIS.

[13]  Roger Zimmermann,et al.  Viewable scene modeling for geospatial video search , 2008, ACM Multimedia.

[14]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, CVPR.

[15]  Mubarak Shah,et al.  GPS-Tag Refinement Using Random Walks with an Adaptive Damping Factor , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Nathan Jacobs,et al.  Revisiting IM2GPS in the Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Changhu Wang,et al.  Photo2Trip: generating travel routes from geo-tagged photos for trip planning , 2010, ACM Multimedia.

[19]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[20]  Roger Zimmermann,et al.  Geographic information use in weakly-supervised deep learning for landmark recognition , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[21]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[24]  Pavel Serdyukov,et al.  Placing flickr photos on a map , 2009, SIGIR.

[25]  Cyrus Shahabi,et al.  MediaQ: mobile multimedia management system , 2014, MMSys '14.

[26]  Mubarak Shah,et al.  Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Cyrus Shahabi,et al.  Geo-Spatial Multimedia Sentiment Analysis in Disasters , 2017, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[29]  Cyrus Shahabi,et al.  Key Frame Selection Algorithms for Automatic Generation of Panoramic Images from Crowdsourced Geo-tagged Videos , 2014, W2GIS.

[30]  Jiebo Luo,et al.  Event recognition: viewing the world with a third eye , 2008, ACM Multimedia.

[31]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[32]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[33]  Scott Workman,et al.  Wide-Area Image Geolocalization with Aerial Reference Imagery , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Tat-Seng Chua,et al.  ViewFocus: explore places of interests on Google maps using photos with view direction filtering , 2009, MM '09.

[35]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[36]  Nenghai Yu,et al.  AMIGO: accurate mobile image geotagging , 2012, ICIMCS '12.

[37]  Cyrus Shahabi,et al.  GeoUGV: user-generated mobile video dataset with fine granularity spatial metadata , 2016, MMSys.

[38]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[39]  Jiebo Luo,et al.  Geo-location inference from image content and user tags , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[40]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..