Discovering place-informative scenes and objects using social media photos

Understanding the visual discrepancy and heterogeneity of different places is of great interest to architectural design, urban design and tourism planning. However, previous studies have been limited by the lack of adequate data and efficient methods to quantify the visual aspects of a place. This work proposes a data-driven framework to explore the place-informative scenes and objects by employing deep convolutional neural network to learn and measure the visual knowledge of place appearance automatically from a massive dataset of photos and imagery. Based on the proposed framework, we compare the visual similarity and visual distinctiveness of 18 cities worldwide using millions of geo-tagged photos obtained from social media. As a result, we identify the visual cues of each city that distinguish that city from others: other than landmarks, a large number of historical architecture, religious sites, unique urban scenes, along with some unusual natural landscapes have been identified as the most place-informative elements. In terms of the city-informative objects, taking vehicles as an example, we find that the taxis, police cars and ambulances are the most place-informative objects. The results of this work are inspiring for various fields—providing insights on what large-scale geo-tagged data can achieve in understanding place formalization and urban design.

[1]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[3]  Vicente Ordonez,et al.  Learning High-Level Judgments of Urban Perception , 2014, ECCV.

[4]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[5]  Henriette Cramer,et al.  Aesthetic capital: what makes london look beautiful, quiet, and happy? , 2014, CSCW.

[6]  Alexei A. Efros,et al.  Where in the world? Human and computer geolocation of images , 2010 .

[7]  Ramesh Raskar,et al.  Deep Learning the City: Quantifying Urban Perception at a Global Scale , 2016, ECCV.

[8]  Chaogui Kang,et al.  Social Sensing: A New Approach to Understanding Our Socioeconomic Environments , 2015 .

[9]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[10]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Michael F. Goodchild,et al.  Constructing places from spatial footprints , 2012, GEOCROWD '12.

[13]  Daniel Gatica-Perez,et al.  Loud and Trendy: Crowdsourcing Impressions of Social Ambiance in Popular Indoor Urban Places , 2014, ACM Multimedia.

[14]  Nathan Jacobs,et al.  Revisiting IM2GPS in the Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Bolei Zhou,et al.  Recognizing City Identity via Attribute Analysis of Geo-tagged Images , 2014, ECCV.

[16]  Krzysztof Janowicz,et al.  Extracting and understanding urban areas of interest using geotagged photos , 2015, Comput. Environ. Urban Syst..

[17]  Yannis Avrithis,et al.  Retrieving landmark and non-landmark images from community photo collections , 2010, ACM Multimedia.

[18]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[19]  Daniel Gatica-Perez,et al.  InnerView: Learning Place Ambiance from Social Media Images , 2016, ACM Multimedia.

[20]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[22]  César A. Hidalgo,et al.  The Collaborative Image of The City: Mapping the Inequality of Urban Perception , 2013, PloS one.

[23]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[24]  Antonio Torralba,et al.  Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.

[25]  Alexei A. Efros,et al.  Large-Scale Image Geolocalization , 2015, Multimodal Location Estimation of Videos and Images.

[26]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Daniel Gatica-Perez,et al.  Venues in Social Media: Examining Ambiance Perception Through Scene Semantics , 2017, ACM Multimedia.

[28]  Byoungkwon An,et al.  Looking Beyond the Visible Scene , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[30]  Hui Lin,et al.  Representing place locales using scene elements , 2018, Comput. Environ. Urban Syst..

[31]  Bolei Zhou,et al.  Landscape and Urban Planning , 2018 .

[32]  J. Jacobs The Death and Life of Great American Cities , 1962 .

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Tat-Seng Chua,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, CVPR.

[35]  Hang-Bong Kang,et al.  Prediction of crime occurrence from multi-modal data using deep learning , 2017, PloS one.

[36]  M. Hidalgo,et al.  Place attachment: Conceptual and empirical questions , 2001 .

[37]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[38]  R. Bruce Hull,et al.  Place identity: symbols of self in the urban fabric , 1994 .

[39]  Jonathan Krause,et al.  Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States , 2017, Proceedings of the National Academy of Sciences.

[40]  Timothée Masquelier,et al.  Deep Networks Can Resemble Human Feed-forward Vision in Invariant Object Recognition , 2015, Scientific Reports.

[41]  Samarth Brahmbhatt,et al.  DeepNav: Learning to Navigate Large Cities , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Hui Lin,et al.  Indoor Space Recognition using Deep Convolutional Neural Network: A Case Study at MIT Campus , 2016, ArXiv.

[43]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[44]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[45]  Hui Wang,et al.  A machine learning-based method for the large-scale evaluation of the qualities of the urban environment , 2017, Comput. Environ. Urban Syst..

[46]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.