HazeEst: Machine Learning Based Metropolitan Air Pollution Estimation From Fixed and Mobile Sensors

Metropolitan air pollution is a growing concern in both developing and developed countries. Fixed-station monitors, typically operated by governments, offer accurate but sparse data, and are increasingly being augmented by lower fidelity but denser measurements taken by mobile sensors carried by concerned citizens and researchers. In this paper, we introduce HazeEst—a machine learning model that combines sparse fixed-station data with dense mobile sensor data to estimate the air pollution surface for any given hour on any given day in Sydney. We assess our system using seven regression models and tenfold cross validation. The results show that estimation accuracy of support vector regression (SVR) is similar to decision tree regression and random forest regression, and higher than extreme gradient boosting, multi-layer perceptrons, linear regression, and adaptive boosting regression. The air pollution estimates from our models are validated via field trials, and results show that SVR not only yields high spatial resolution estimates that correspond well with the pollution surface obtained from fixed and mobile sensor monitoring systems, but also indicates boundaries of polluted area better than other regression models. Our results can be visualized using a Web-based application customized for metropolitan Sydney. We believe that the continuous estimates provided by our system can better inform air pollution exposure and its impact on human health.

[1]  Lior Rokach,et al.  Data Mining with Decision Trees - Theory and Applications , 2007, Series in Machine Perception and Artificial Intelligence.

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3]  Vikas Singh,et al.  A cokriging based approach to reconstruct air pollution maps, processing measurement station concentrations and deterministic model simulations , 2011, Environ. Model. Softw..

[4]  Allison Woodruff,et al.  Common Sense: participatory urban sensing using a network of handheld air quality monitors , 2009, SenSys '09.

[5]  Liviu Iftode,et al.  Real-time air quality monitoring through mobile sensing in metropolitan areas , 2013, UrbComp '13.

[6]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[10]  Lothar Thiele,et al.  Pushing the spatio-temporal resolution limit of urban air pollution maps , 2014, 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  Ali Marjovi,et al.  High Resolution Air Pollution Maps in Urban Environments Using Mobile Sensor Networks , 2015, 2015 International Conference on Distributed Computing in Sensor Systems.

[13]  B. Brunekreef,et al.  Particulate matter air pollution components and risk for lung cancer. , 2016, Environment international.

[14]  Karl Aberer,et al.  ExposureSense: Integrating daily activities with air quality using mobile participatory sensing , 2013, 2013 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[16]  Simon Kingham,et al.  Mapping Urban Air Pollution Using GIS: A Regression-Based Approach , 1997, Int. J. Geogr. Inf. Sci..

[17]  B. Brunekreef,et al.  Air pollution and lung cancer incidence in 17 European cohorts: prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE). , 2013, The Lancet. Oncology.

[18]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[19]  Lothar Thiele,et al.  Deriving high-resolution urban air pollution maps using mobile sensor nodes , 2015 .

[20]  Benjamin Manning,et al.  Extreme Gradient Boosting and Behavioral Biometrics , 2017, AAAI.

[21]  Thomas Kuhlbusch,et al.  Association of ambient air pollution with the prevalence and incidence of COPD , 2014, European Respiratory Journal.

[22]  Vijay Sivaraman,et al.  Design and Evaluation of a Metropolitan Air Pollution Sensing System , 2016, IEEE Sensors Journal.

[23]  Abdullah Kadri,et al.  Urban Air Pollution Monitoring System With Forecasting Models , 2016, IEEE Sensors Journal.

[24]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[25]  Vijay Sivaraman,et al.  HazeWatch: A participatory sensor system for monitoring air pollution in Sydney , 2013, 38th Annual IEEE Conference on Local Computer Networks - Workshops.

[26]  Gb Stewart,et al.  The use of electrochemical sensors for monitoring urban air quality in low-cost, high-density networks , 2013 .