OpenLUR: Off-the-shelf air pollution modeling with open features and machine learning

Abstract To assess the exposure of citizens to pollutants like NOx or particulate matter in urban areas, land use regression (LUR) models are a well established method. LUR models leverage information about environmental and anthropogenic factors such as cars, heating, or industry to predict air pollution in areas where no measurements have been made. However, existing approaches are often not globally applicable and require tedious hyper-parameter tuning to enable high quality predictions. In this work, we tackle these issues by introducing OpenLUR, an off-the-shelf approach for modeling air pollution that (i) works on a set of novel features solely extracted from the globally and openly available data source OpenStreetMap and (ii) is based on state-of-the-art machine learning featuring automated hyper-parameter tuning in order to minimize manual effort. We show that our proposed features are able to outperform their counterparts from local and closed sources, and illustrate how automated hyper parameter tuning can yield competitve results while alleviating the need for expert knowledge in machine learning and manual effort. Importantly, we further demonstrate the potential of the global availability of our features by applying cross-learning across different cities in order to reduce the need for a large amount of training samples. Overall, OpenLUR represents an off-the-shelf approach that facilitates easily reproducible experiments and the development of globally applicable models.

[1]  Bert Brunekreef,et al.  Land Use Regression Models for Ultrafine Particles and Black Carbon Based on Short-Term Monitoring Predict Past Spatial Variation. , 2015, Environmental science & technology.

[2]  Richard Taylor Interpretation of the Correlation Coefficient: A Basic Review , 1990 .

[3]  Geert Wets,et al.  Modeling temporal and spatial variability of traffic-related air pollution: Hourly land use regression models for black carbon , 2013 .

[4]  Doug Brugge,et al.  An hourly regression model for ultrafine particles in a near-highway urban area. , 2014, Environmental science & technology.

[5]  M. Haklay How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets , 2010 .

[6]  Lothar Thiele,et al.  Pushing the spatio-temporal resolution limit of urban air pollution maps , 2014, 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[7]  Steve Hankey,et al.  Land Use Regression Models of On-Road Particulate Air Pollution (Particle Number, Black Carbon, PM2.5, Particle Size) Using Mobile Monitoring. , 2015, Environmental science & technology.

[8]  Alexei Lyapustin,et al.  Estimation of daily PM10 and PM2.5 concentrations in Italy, 2013-2015, using a spatiotemporal land-use random-forest model. , 2019, Environment international.

[9]  Edward Ng,et al.  Developing Street-Level PM2.5 and PM10 Land Use Regression Models in High-Density Hong Kong with Urban Morphological Factors. , 2016, Environmental science & technology.

[10]  Mike Smith,et al.  A new dynamic traffic model and the existence and calculation of dynamic user equilibria on congested capacity-constrained road networks , 1993 .

[11]  M. G. Estes,et al.  Estimating ground-level PM(2.5) concentrations in the southeastern U.S. using geographically weighted regression. , 2013, Environmental research.

[12]  Md. Saniul Alam,et al.  Augmenting limited background monitoring data for improved performance in land use regression modelling: Using support vector regression and mobile monitoring , 2019, Atmospheric Environment.

[13]  Erika von Mutius,et al.  Modeling annual benzene, toluene, NO2, and soot concentrations on the basis of road traffic characteristics. , 2002, Environmental research.

[14]  A. Peters,et al.  Particulate Matter Air Pollution and Cardiovascular Disease: An Update to the Scientific Statement From the American Heart Association , 2010, Circulation.

[15]  Jacinto Estima,et al.  Investigating the Potential of OpenStreetMap for Land Use/Land Cover Production: A Case Study for Continental Portugal , 2015, OpenStreetMap in GIScience.

[16]  Alex Alves Freitas,et al.  A new approach for interpreting Random Forest models and its application to the biology of ageing , 2018, Bioinform..

[17]  Vittorio Loreto,et al.  Participatory Patterns in an International Air Quality Monitoring Initiative , 2015, PloS one.

[18]  Daniel Neagu,et al.  Interpreting random forest classification models using a feature contribution method , 2013, IRI.

[19]  Alexander Zipf,et al.  Identifying elements at risk from OpenStreetMap: The case of flooding , 2014, ISCRAM.

[20]  M Hatzopoulou,et al.  Capturing the sensitivity of land-use regression models to short-term mobile monitoring campaigns using air pollution micro-sensors. , 2017, Environmental pollution.

[21]  Altaf Arain,et al.  A Land Use Regression Model for Predicting Ambient Concentrations of Nitrogen Dioxide in Hamilton, Ontario, Canada , 2006, Journal of the Air & Waste Management Association.

[22]  M. Pokorski,et al.  Respiratory Health , 2004 .

[23]  Patrick Weber,et al.  OpenStreetMap: User-Generated Street Maps , 2008, IEEE Pervasive Computing.

[24]  B. Brunekreef,et al.  Land use regression modelling estimating nitrogen oxides exposure in industrial south Durban, South Africa. , 2018, The Science of the total environment.

[25]  P. Hopke,et al.  Modeling particulate matter concentrations measured through mobile monitoring in a deletion/substitution/addition approach , 2015 .

[26]  G. Lemasters,et al.  Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches. , 2017, Atmospheric environment.

[27]  John D. Spengler,et al.  Characterizing local traffic contributions to particulate air pollution in street canyons using mobile monitoring techniques , 2011 .

[28]  J D Spengler,et al.  Respiratory health and PM10 pollution. A daily time series analysis. , 1991, The American review of respiratory disease.

[29]  Md. Saniul Alam,et al.  Exploring the modeling of spatiotemporal variations in ambient air pollution within the land use regression framework: Estimation of PM10 concentrations on a daily basis , 2015, Journal of the Air & Waste Management Association.

[30]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[31]  Dennis Luxen,et al.  Real-time routing with OpenStreetMap data , 2011, GIS.

[32]  Alessandra Spinali,et al.  Effects of particulate matter (PM(10), PM(2.5) and PM(1)) on the cardiovascular system. , 2009, Toxicology.

[33]  Khandaker Mustakimur Rahman,et al.  Location based early disaster warning and evacuation system on mobile phones using OpenStreetMap , 2012, 2012 IEEE Conference on Open Systems.

[34]  Robert Hecht,et al.  Measuring Completeness of Building Footprints in OpenStreetMap over Space and Time , 2013, ISPRS Int. J. Geo Inf..

[35]  Michael Jerrett,et al.  The use of wind fields in a land use regression model to predict air pollution concentrations for health exposure studies , 2007 .

[36]  Marcela Rivera,et al.  Spatio-temporal variation of urban ultrafine particle number concentrations , 2014 .

[37]  Altaf Arain,et al.  A review and evaluation of intraurban air pollution exposure models , 2005, Journal of Exposure Analysis and Environmental Epidemiology.

[38]  Bert Brunekreef,et al.  Estimating Long-Term Average Particulate Air Pollution Concentrations: Application of Traffic Indicators and Geographic Information Systems , 2003, Epidemiology.

[39]  Stefan Krauss,et al.  MICROSCOPIC MODELING OF TRAFFIC FLOW: INVESTIGATION OF COLLISION FREE VEHICLE DYNAMICS. , 1998 .

[40]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[41]  Michael Brauer,et al.  Application of land use regression to estimate long-term concentrations of traffic-related nitrogen oxides and fine particulate matter. , 2007, Environmental science & technology.

[42]  Marco Minghini,et al.  Tagging in Volunteered Geographic Information: An Analysis of Tagging Practices for Cities and Urban Regions in OpenStreetMap , 2016, ISPRS Int. J. Geo Inf..

[43]  J. Siemiatycki,et al.  0289 ”david´s cheese bread” method: workload quantitative exposure thresholds detection using adjusted hazard multivariate parametric modelling, useful in cumulative-trauma disorders prevention and within their causal assessment , 2017, Occupational and Environmental Medicine.

[44]  Michael Brauer,et al.  Mobile monitoring of particle light absorption coefficient in an urban area as a basis for land use regression. , 2009, Environmental science & technology.

[45]  Bernardo Wagner,et al.  Autonomous robot navigation based on OpenStreetMap geodata , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[46]  P. Elliott,et al.  A regression-based method for mapping traffic-related air pollution: application and testing in four contrasting urban environments. , 2000, The Science of the total environment.

[47]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[48]  M. Jerrett,et al.  A distance-decay variable selection strategy for land use regression modeling of ambient air pollution exposures. , 2009, The Science of the total environment.

[49]  B. Brunekreef,et al.  Estimation of outdoor NO(x), NO(2), and BTEX exposure in a cohort of pregnant women using land use regression modeling. , 2008, Environmental science & technology.

[50]  Mikhail F. Kanevski,et al.  Air Pollution Mapping Using Nonlinear Land Use Regression Models , 2014, ICCSA.

[51]  Y. Heymann,et al.  CORINE Land Cover. Technical Guide , 1994 .

[52]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..