Predicting and mapping neighborhood-scale health outcomes: A machine learning approach

Abstract Estimating health outcomes at a neighborhood scale is important for promoting urban health, yet costly and time-consuming. In this paper, we present a machine-learning-enabled approach to predicting the prevalence of six common non-communicable chronic diseases at the census tract level. We apply our approach to the City of Austin and show that our method can yield fairly accurate predictions. In searching for the best predictive models, we experiment with eight different machine learning algorithms and 60 predictor variables that characterize the social environment, the physical environment, and the aspects and degrees of neighborhood disorder. Our analysis suggests that (a) the sociodemographic and socioeconomic variables are the strongest predictors for tract-level health outcomes and (b) the historical records of 311 service requests can be a useful complementary data source as the information distilled from the 311 data often helps improve the models' performance. The machine learning models yielded from this study can help the public and city officials evaluate future scenarios and understand how changes in the neighborhood conditions can lead to changes in the health outcomes. By analyzing where the most significant discrepancies between the predicted and the actual values are, we will also be ready to identify areas of best practice and areas in need of greater investment or policy intervention.

[1]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[2]  R. Ewing,et al.  The built environment and obesity. , 2007, Epidemiologic reviews.

[3]  Svetha Venkatesh,et al.  Is Demography Destiny? Application of Machine Learning Techniques to Accurately Predict Population Health Outcomes from a Minimal Demographic Dataset , 2015, PloS one.

[4]  A. Morris,et al.  Locality deprivation and Type 2 diabetes incidence: a local test of relative inequalities. , 2007, Social science & medicine.

[5]  A. Condeço-Melhorado,et al.  City dynamics through Twitter: Relationships between land use and spatiotemporal demographics , 2018 .

[6]  S. Rose Mortality risk score prediction in an elderly population using machine learning. , 2013, American journal of epidemiology.

[7]  K. Pickett,et al.  Multilevel analyses of neighbourhood socioeconomic context and health outcomes: a critical review , 2001, Journal of epidemiology and community health.

[8]  Sunil R. Gupta,et al.  Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry , 2014, BMJ Open.

[9]  A. D. Diez Roux,et al.  Neighborhood resources for physical activity and healthy foods and incidence of type 2 diabetes mellitus: the Multi-Ethnic study of Atherosclerosis. , 2009, Archives of internal medicine.

[10]  Michael E. Holmes,et al.  Revisiting Image of the City in Cyberspace: Analysis of Spatial Twitter Messages During a Special Event , 2018 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[13]  J. Friedman Stochastic gradient boosting , 2002 .

[14]  Hanyu Ni,et al.  Neighborhood Characteristics and Hypertension , 2008, Epidemiology.

[15]  D. Arveiler,et al.  Residential environment and blood pressure in the PRIME Study: is the association mediated by body mass index and waist circumference? , 2008, Journal of hypertension.

[16]  Pablo Martí,et al.  Social Media data: Challenges, opportunities and limitations in urban studies , 2019, Comput. Environ. Urban Syst..

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  A. D. Diez Roux,et al.  Neighborhoods and health , 2010, Annals of the New York Academy of Sciences.

[19]  Greg P. Griffin,et al.  Crowdsourcing Bike Share Station Locations , 2019 .

[20]  Atsushi Nara,et al.  Twitter-based measures of neighborhood sentiment as predictors of residential population health , 2019, PloS one.

[21]  Jenny Roe Cities, Green Space, and Mental Well-Being , 2016 .

[22]  Andrea Garfinkel-Castro,et al.  Do Better Urban Design Qualities Lead to More Walking in Salt Lake City, Utah? , 2015 .

[23]  Manuel Franco,et al.  Fast-food consumption, diet quality, and neighborhood exposure to fast food: the multi-ethnic study of atherosclerosis. , 2009, American journal of epidemiology.

[24]  Reid Ewing,et al.  Travel and the Built Environment , 2010 .

[25]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[26]  S. Galea,et al.  Are neighbourhood characteristics associated with depressive symptoms? A review of evidence , 2008, Journal of Epidemiology & Community Health.

[27]  Sherri Rose,et al.  Classifying Lung Cancer Severity with Ensemble Machine Learning in Health Care Claims Data , 2017, MLHC.

[28]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[29]  C. Aneshensel,et al.  The neighborhood context of adolescent mental health. , 1996, Journal of health and social behavior.

[30]  Lingjing Wang,et al.  Structure of 311 service requests as a signature of urban location , 2016, PloS one.

[31]  A. D. Diez Roux,et al.  Cross-sectional and longitudinal associations of neighborhood cohesion and stressors with depressive symptoms in the multiethnic study of atherosclerosis. , 2009, Annals of Epidemiology.

[32]  Mark Stevenson,et al.  City planning and population health: a global challenge , 2016, The Lancet.

[33]  Filiz Garip,et al.  Machine Learning for Sociology , 2019, Annual Review of Sociology.

[34]  Daniel Kim,et al.  Blues from the neighborhood? Neighborhood characteristics and depression. , 2008, Epidemiologic reviews.

[35]  D. O’Brien The Urban Commons , 2018 .

[36]  Albert T. Young,et al.  Development and Validation of an Electronic Health Record–Based Machine Learning Model to Estimate Delirium Risk in Newly Hospitalized Patients Without Known Cognitive Impairment , 2018, JAMA network open.

[37]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[38]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[39]  Kazutoshi Sumiya,et al.  Urban area characterization based on crowd behavioral lifelogs over Twitter , 2012, Personal and Ubiquitous Computing.

[40]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[41]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .