Learning Multi-context Aware Location Representations from Large-scale Geotagged Images

With the ubiquity of sensor-equipped smartphones, it is common to have multimedia documents uploaded to the Internet that have GPS coordinates associated with them. Utilizing such geotags as an additional feature is intuitively appealing for improving the performance of location-aware applications. However, raw GPS coordinates are fine-grained location indicators without any semantic information. Existing methods on geotag semantic encoding mostly extract hand-crafted, application-specific location representations that heavily depend on large-scale supplementary data and thus cannot perform efficiently on mobile devices. In this paper, we present a machine learning based approach, termed GPS2Vec+, which learns rich location representations by capitalizing on the world-wide geotagged images. Once trained, the model has no dependence on the auxiliary data anymore so it encodes geotags highly efficiently by inference. We extract visual and semantic knowledge from image content and user-generated tags, and transfer the information into locations by using geotagged images as a bridge. To adapt to different application domains, we further present an attention-based fusion framework that estimates the importance of the learnt location representations under different contexts for effective feature fusion. Our location representations yield significant performance improvements over the state-of-the-art geotag encoding methods on image classification and venue annotation.

[1]  Gordon Christie,et al.  Functional Map of the World , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Jiebo Luo,et al.  Inferring generic activities and events from image content and bags of geo-tags , 2008, CIVR '08.

[3]  Xiaoyong Du,et al.  Tag Features for Geo-Aware Image Classification , 2015, IEEE Transactions on Multimedia.

[4]  B. S. Manjunath,et al.  Global annotation on georeferenced photographs , 2009, CIVR '09.

[5]  Daqing Zhang,et al.  NationTelescope: Monitoring and visualizing large-scale collective behavior in LBSNs , 2015, J. Netw. Comput. Appl..

[6]  Behrouz Minaei-Bidgoli,et al.  Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces , 2015, ArXiv.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Fei-Fei Li,et al.  Improving Image Classification with Location Context , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[10]  Marcel Worring,et al.  Fusing concept detection and geo context for visual search , 2012, ICMR.

[11]  Jaeyoung Choi,et al.  Kickstarting the Commons: The YFCC100M and the YLI Corpora , 2015, MMCommons '15.

[12]  Derek Hoiem,et al.  Building text features for object image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Daqing Zhang,et al.  Participatory Cultural Mapping Based on Collective Behavior Data in Location-Based Social Networks , 2016, ACM Trans. Intell. Syst. Technol..

[14]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[16]  Chao Zhang,et al.  SERM: A Recurrent Model for Next Location Prediction in Semantic Trajectories , 2017, CIKM.

[17]  John Krumm,et al.  Placer: semantic place labels from diary data , 2013, UbiComp.

[18]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[19]  Krzysztof Janowicz,et al.  On the semantic annotation of places in location-based social networks , 2011, KDD.

[20]  Roger Zimmermann,et al.  GPS2Vec: Towards Generating Worldwide GPS Embeddings , 2019, SIGSPATIAL/GIS.

[21]  Hatem Mousselly Sergieh,et al.  World-wide scale geotagged image dataset for automatic image annotation and reverse geotagging , 2014, MMSys '14.

[22]  Jitendra Malik,et al.  Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xueming Qian,et al.  Tagging photos using users' vocabularies , 2013, Neurocomputing.

[24]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.