Deep Learning the City: Quantifying Urban Perception at a Global Scale

Computer vision methods that quantify the perception of urban environment are increasingly being used to study the relationship between a city’s physical appearance and the behavior and health of its residents. Yet, the throughput of current methods is too limited to quantify the perception of cities across the world. To tackle this challenge, we introduce a new crowdsourced dataset containing 110,988 images from 56 cities, and 1,170,000 pairwise comparisons provided by 81,630 online volunteers along six perceptual attributes: safe, lively, boring, wealthy, depressing, and beautiful. Using this data, we train a Siamese-like convolutional neural architecture, which learns from a joint classification and ranking loss, to predict human judgments of pairwise image comparisons. Our results show that crowdsourcing combined with neural networks can produce urban perception data at the global scale.

[1]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Alexei A. Efros,et al.  Linking Past to Present: Discovering Style in Two Centuries of Architecture , 2015, 2015 IEEE International Conference on Computational Photography (ICCP).

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Ramesh Raskar,et al.  Do People Shape Cities, or Do Cities Shape People? The Co-Evolution of Physical, Social, and Economic Change in Five Major U.S. Cities , 2015 .

[5]  Robert J. Sampson,et al.  Divergent Pathways of Gentrification , 2014 .

[6]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[7]  Jianxiong Xiao,et al.  What makes an image memorable , 2011 .

[8]  Alexander C. Berg,et al.  Hipster Wars: Discovering Elements of Fashion Styles , 2014, ECCV.

[9]  Thomas Hofmann,et al.  TrueSkill™: A Bayesian Skill Rating System , 2007 .

[10]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[11]  Michel Wedel,et al.  The effects of alternative methods of collecting similarity data for Multidimensional Scaling , 1995 .

[12]  Bernard E. Harcourt,et al.  Reflecting on the Subject: A Critique of the Social Influence Conception of Deterrence, the Broken Windows Theory, and Order-Maintenance Policing New York Style , 1998 .

[13]  Wesley G. Skogan,et al.  Review of Fixing Broken Windows: Restoring Order And Reducing Crime In Our Communities by G Kelling and C Coles , 1997 .

[14]  César A. Hidalgo,et al.  The Collaborative Image of The City: Mapping the Inequality of Urban Perception , 2013, PloS one.

[15]  Austin Troy,et al.  Effects of skeletal streetscape design on perceived safety , 2015 .

[16]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[17]  Milam Aj,et al.  Perceived School and Neighborhood Safety, Neighborhood Violence and Academic Achievement in Urban School Children. , 2010, The Urban review.

[18]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[20]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[21]  Alexei A. Efros,et al.  City Forensics: Using Visual Elements to Predict Non-Visual City Attributes , 2014, IEEE Transactions on Visualization and Computer Graphics.

[22]  Per-Olof Persson,et al.  A Simple Mesh Generator in MATLAB , 2004, SIAM Rev..

[23]  Devi Parikh,et al.  Understanding image virality , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Vicente Ordonez,et al.  Learning High-Level Judgments of Urban Perception , 2014, ECCV.

[25]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  T. Pratt Great American city: Chicago and the enduring neighborhood effect , 2013 .

[27]  Michael Luca,et al.  Big Data and Big Cities: The Promises and Limitations of Improved Measures of Urban Life , 2015 .

[28]  Olivia Affuso,et al.  The associations of perceived neighborhood disorder and physical activity with obesity among African American adolescents , 2013, BMC Public Health.

[29]  J. Wilson,et al.  BROKEN WINDOWS: THE POLICE AND NEIGHBOURHOOD SAFETY , 1982 .

[30]  Gordon D. A. Brown,et al.  Absolute identification by relative judgment. , 2005, Psychological review.

[31]  Alexei A. Efros,et al.  Mirror mirror , 2014, ACM Trans. Graph..

[32]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  César A. Hidalgo,et al.  Cities Are Physical Too: Using Computer Vision to Measure the Quality and Impact of Urban Appearance , 2016 .

[34]  Jiebo Luo,et al.  Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[35]  Henriette Cramer,et al.  Aesthetic capital: what makes london look beautiful, quiet, and happy? , 2014, CSCW.

[36]  Nicu Sebe,et al.  Affective Analysis of Professional and Amateur Abstract Paintings Using Statistical Analysis and Art Theory , 2015, ACM Trans. Interact. Intell. Syst..

[37]  Bolei Zhou,et al.  Recognizing City Identity via Attribute Analysis of Geo-tagged Images , 2014, ECCV.

[38]  Shih-Fu Chang,et al.  Predicting Viewer Perceived Emotions in Animated GIFs , 2014, ACM Multimedia.

[39]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[40]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[42]  Adriana Kovashka,et al.  WhittleSearch: Image search with relative attribute feedback , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Lorenzo Porzi,et al.  Predicting and Understanding Urban Perception with Convolutional Neural Networks , 2015, ACM Multimedia.

[44]  Linda Steg,et al.  The Spreading of Disorder , 2008, Science.

[45]  Byoungkwon An,et al.  Looking Beyond the Visible Scene , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Anton E Kunst,et al.  The association between neighborhood disorder, social cohesion and hazardous alcohol use: a national multilevel study. , 2012, Drug and alcohol dependence.

[47]  Edward L. Glaeser,et al.  Preserving history or restricting development? The heterogeneous effects of historic districts on local housing markets in New York City☆ , 2016 .

[48]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[49]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[50]  Ramesh Raskar,et al.  Streetscore -- Predicting the Perceived Safety of One Million Streetscapes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[51]  Jeffrey S. Wilson,et al.  Using Google Street View to Audit the Built Environment: Inter-rater Reliability Results , 2013, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[52]  Victoria Basolo,et al.  Neighborhood physical conditions and health. , 2003, American journal of public health.

[53]  B. Claussen,et al.  Physical activity among elderly people in a city population: the influence of neighbourhood level violence and self perceived safety , 2006, Journal of Epidemiology and Community Health.

[54]  Wesley G. Skogan,et al.  Fixing Broken Windows: Restoring Order and Reducing Crime in Our Communities by George L. Kelling and Catherine M. Coles:Life in the Gang: Family, Friends, and Violence , 1997 .

[55]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[56]  Devi Parikh,et al.  Attributes for Classifier Feedback , 2012, ECCV.

[57]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[58]  S. Sathiya Keerthi,et al.  Efficient algorithms for ranking with SVMs , 2010, Information Retrieval.

[59]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.