论文信息 - Predicting neighborhoods’ socioeconomic attributes using restaurant data

Predicting neighborhoods’ socioeconomic attributes using restaurant data

Significance High-resolution socioeconomic data are crucial for place-based policy design and implementation, but it remains scarce for many developing cities and countries. We show that an easily accessible and timely updated neighborhood attribute, restaurant, when combined with machine-learning models, can be used to effectively predict a range of socioeconomic attributes. This approach allows us to collect training samples from representative neighborhoods and then use our trained model to infer unsampled neighborhoods in the city in a granular, timely, and low-cost manner. The good cross-city transferability performance of our model can also help bridge the “data gap” between cities, by training the model in cities with rich survey data and then applying it to cities where such data are unavailable. Accessing high-resolution, timely socioeconomic data such as data on population, employment, and enterprise activity at the neighborhood level is critical for social scientists and policy makers to design and implement location-based policies. However, in many developing countries or cities, reliable local-scale socioeconomic data remain scarce. Here, we show an easily accessible and timely updated location attribute—restaurant—can be used to accurately predict a range of socioeconomic attributes of urban neighborhoods. We merge restaurant data from an online platform with 3 microdatasets for 9 Chinese cities. Using features extracted from restaurants, we train machine-learning models to estimate daytime and nighttime population, number of firms, and consumption level at various spatial resolutions. The trained model can explain 90 to 95% of the variation of those attributes across neighborhoods in the test dataset. We analyze the tradeoff between accuracy, spatial resolution, and number of training samples, as well as the heterogeneity of the predicted results across different spatial locations, demographics, and firm industries. Finally, we demonstrate the cross-city generality of this method by training the model in one city and then applying it directly to other cities. The transferability of this restaurant model can help bridge data gaps between cities, allowing all cities to enjoy big data and algorithm dividends.

[1] Michael Luca,et al. Nowcasting Gentrification: Using Yelp Data to Quantify Neighborhood Change , 2018 .

[2] A. Tatem,et al. Dynamic population mapping using mobile phone data , 2014, Proceedings of the National Academy of Sciences.

[3] T. Graepel,et al. Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[4] J. Henderson,et al. Measuring Economic Growth from Outer Space , 2009, The American economic review.

[5] Michael Luca,et al. Big Data and Big Cities: The Promises and Limitations of Improved Measures of Urban Life , 2015 .

[6] H. Varian,et al. Predicting the Present with Google Trends , 2012 .

[7] Jonathan Levin,et al. Economics in the age of big data , 2014, Science.

[8] Chao Li,et al. Measuring economic activity in China with mobile big data , 2017, EPJ Data Science.

[9] E. Glaeser,et al. Big Data and Big Cities: The Promises and Limitations of Improved Measures of Urban Life , 2018 .

[10] Catherine Linard,et al. Spatiotemporal patterns of population in mainland China, 1990 to 2010 , 2016, Scientific Data.

[11] David Lazer,et al. Tracking employment shocks using mobile phone data , 2015, Journal of The Royal Society Interface.

[12] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[13] Jonathan Krause,et al. Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States , 2017, Proceedings of the National Academy of Sciences.

[14] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[15] A. Storeygard,et al. The View from Above: Applications of Satellite Data in Economics , 2016 .

[16] Sang Michael Xie,et al. Combining satellite imagery and machine learning to predict poverty , 2016, Science.

[17] Sangram Ganguly,et al. DeepSD: Generating High Resolution Climate Change Projections through Single Image Super-Resolution , 2017, KDD.

[18] Gabriel Cadamuro,et al. Predicting poverty and wealth from mobile phone metadata , 2015, Science.

[19] Michael Luca,et al. Nowcasting the Local Economy: Using Yelp Data to Measure Economic Activity , 2017 .

[20] Yejin Choi,et al. Where Not to Eat? Improving Public Policy by Predicting Hygiene Inspections Using Online Reviews , 2013, EMNLP.

[21] A. Cavallo. Scraped Data and Sticky Prices , 2015, Review of Economics and Statistics.

[22] Susan Athey,et al. Beyond prediction: Using big data for policy problems , 2017, Science.

[23] W. Nordhaus,et al. Using luminosity data as a proxy for economic statistics , 2011, Proceedings of the National Academy of Sciences.

[24] David J. Martin,et al. Developing a Flexible Framework for Spatiotemporal Population Modeling , 2015 .

[25] Ramesh Raskar,et al. Computer vision uncovers predictors of physical urban change , 2017, Proceedings of the National Academy of Sciences.

[26] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[27] A. Tatema,et al. Spatially disaggregated population estimates in the absence of national population and housing census data , 2018 .

[28] Sendhil Mullainathan,et al. Machine Learning: An Applied Econometric Approach , 2017, Journal of Economic Perspectives.

[29] Jonathan I. Dingel,et al. How Segregated Is Urban Consumption? , 2017, Journal of Political Economy.