Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States

Significance We show that socioeconomic attributes such as income, race, education, and voting patterns can be inferred from cars detected in Google Street View images using deep learning. Our model works by discovering associations between cars and people. For example, if the number of sedans in a city is higher than the number of pickup trucks, that city is likely to vote for a Democrat in the next presidential election (88% chance); if not, then the city is likely to vote for a Republican (82% chance). The United States spends more than $250 million each year on the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed several years. As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may become an increasingly practical supplement to the ACS. Here, we present a method that estimates socioeconomic characteristics of regions spanning 200 US cities by using 50 million images of street scenes gathered with Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22 million automobiles in total (8% of all automobiles in the United States), were used to accurately estimate income, race, education, and voting patterns at the zip code and precinct level. (The average US precinct contains ∼1,000 people.) The resulting associations are surprisingly simple and powerful. For instance, if the number of sedans encountered during a drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next presidential election (88% chance); otherwise, it is likely to vote Republican (82%). Our results suggest that automated systems for monitoring demographics may effectively complement labor-intensive approaches, with the potential to measure demographics with fine spatial resolution, in close to real time.

[1]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Patricia L. Mokhtarian,et al.  What type of vehicle do people drive? The role of attitude and lifestyle in influencing vehicle type choice - eScholarship , 2004 .

[4]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[5]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Carlin Romano Looking beyond the Visible , 2012 .

[9]  Ramesh Raskar,et al.  Streetscore -- Predicting the Perceived Safety of One Million Streetscapes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Byoungkwon An,et al.  Looking Beyond the Visible Scene , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Michael J. Cafarella,et al.  Using Social Media to Measure Labor Market Flows , 2014 .

[12]  Bolei Zhou,et al.  Recognizing City Identity via Attribute Analysis of Geo-tagged Images , 2014, ECCV.

[13]  Vicente Ordonez,et al.  Learning High-Level Judgments of Urban Perception , 2014, ECCV.

[14]  Francesc Moreno-Noguer,et al.  Neuroaesthetics in fashion: Modeling the perception of fashionability , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Gabriel Cadamuro,et al.  Predicting poverty and wealth from mobile phone metadata , 2015, Science.

[17]  Sang Michael Xie,et al.  Combining satellite imagery and machine learning to predict poverty , 2016, Science.

[18]  Ramesh Raskar,et al.  Computer vision uncovers predictors of physical urban change , 2017, Proceedings of the National Academy of Sciences.

[19]  Noah Snavely,et al.  StreetStyle: Exploring world-wide clothing styles from millions of photos , 2017, ArXiv.

[20]  S. Kruger The Decennial Census , 2017 .

[21]  Jonathan Krause,et al.  Fine-Grained Car Detection for Visual Census Estimation , 2017, AAAI.