ActiveRemediation: The Search for Lead Pipes in Flint, Michigan

We detail our ongoing work in Flint, Michigan to detect pipes made of lead and other hazardous metals. After elevated levels of lead were detected in residents' drinking water, followed by an increase in blood lead levels in area children, the state and federal governments directed over $125 million to replace water service lines, the pipes connecting each home to the water system. In the absence of accurate records, and with the high cost of determining buried pipe materials, we put forth a number of predictive and procedural tools to aid in the search and removal of lead infrastructure. Alongside these statistical and machine learning approaches, we describe our interactions with government officials in recommending homes for both inspection and replacement, with a focus on the statistical model that adapts to incoming information. Finally, in light of discussions about increased spending on infrastructure development by the federal government, we explore how our approach generalizes beyond Flint to other municipalities nationwide.

[1]  Duncan Lee,et al.  A comparison of conditional autoregressive models used in Bayesian disease mapping. , 2011, Spatial and spatio-temporal epidemiology.

[2]  Eric Potash,et al.  Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning , 2015, KDD.

[3]  Eric M. Schwartz,et al.  Flint Water Crisis: Data-Driven Risk Assessment Via Residential Water Testing , 2016, ArXiv.

[4]  Eric M. Schwartz,et al.  A Data Science Approach to Understanding Residential Water Contamination in Flint , 2017, KDD.

[5]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[6]  Goo Jun,et al.  Spatially Cost-Sensitive Active Learning , 2009, SDM.

[7]  R. Sadler,et al.  Elevated Blood Lead Levels in Children Associated With the Flint Drinking Water Crisis: A Spatial Analysis of Risk and Public Health Response. , 2016, American journal of public health.

[8]  Michael Torrice,et al.  How Lead Ended Up In Flint’s Water , 2016 .

[9]  Duncan Lee,et al.  CARBayes: An R Package for Bayesian Spatial Modeling with Conditional Autoregressive Priors , 2013 .

[10]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[11]  Alexander Y. Liu,et al.  Active Learning with Spatially Sensitive Labeling Costs , 2008 .

[12]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[13]  Maria-Florina Balcan,et al.  Statistical Active Learning Algorithms , 2013, NIPS.

[14]  Jonathan C. Stroud,et al.  The Michigan Data Science Team: A Data Science Education Program with Significant Social Impact , 2018, 2018 IEEE Data Science Workshop (DSW).

[15]  A. Gelfand,et al.  Proper multivariate conditional autoregressive models for spatial data analysis. , 2003, Biostatistics.

[16]  Maria-Florina Balcan,et al.  The true sample complexity of active learning , 2010, Machine Learning.