Modeling PM2.5 Urban Pollution Using Machine Learning and Selected Meteorological Parameters

Outdoor air pollution costs millions of premature deaths annually, mostly due to anthropogenic fine particulate matter (or PM2.5). Quito, the capital city of Ecuador, is no exception in exceeding the healthy levels of pollution. In addition to the impact of urbanization, motorization, and rapid population growth, particulate pollution is modulated by meteorological factors and geophysical characteristics, which complicate the implementation of the most advanced models of weather forecast. Thus, this paper proposes a machine learning approach based on six years of meteorological and pollution data analyses to predict the concentrations of PM2.5 from wind (speed and direction) and precipitation levels. The results of the classification model show a high reliability in the classification of low ( 25 µg/m3) and low (<10 µg/m3) versus moderate (10–25 µg/m3) concentrations of PM2.5. A regression analysis suggests a better prediction of PM2.5 when the climatic conditions are getting more extreme (strong winds or high levels of precipitation). The high correlation between estimated and real data for a time series analysis during the wet season confirms this finding. The study demonstrates that the use of statistical models based on machine learning is relevant to predict PM2.5 concentrations from meteorological data.

[1]  J. Lelieveld,et al.  The contribution of outdoor air pollution sources to premature mortality on a global scale , 2015, Nature.

[2]  Mohammad Arhami,et al.  Predicting hourly air pollutant levels using artificial neural networks coupled with uncertainty analysis by Monte Carlo simulations , 2013, Environmental Science and Pollution Research.

[3]  Jaakko Kukkonen,et al.  The Spatial and Temporal Variation of Measured Urban PM10 and PM2.5 in the Helsinki Metropolitan Area , 2002 .

[4]  Fan Zhang,et al.  Fine particles (PM2.5) at a CAWNET background site in Central China: Chemical compositions, seasonal variations and regional pollution events , 2014 .

[5]  Jianhua Wang,et al.  Effects of Meteorological Conditions on PM2.5 Concentrations in Nagasaki, Japan , 2015, International journal of environmental research and public health.

[6]  Hong Huang,et al.  Relevance analysis and short-term prediction of PM2.5 concentrations in Beijing based on multi-source data , 2017 .

[7]  D. Dockery,et al.  Health Effects of Fine Particulate Air Pollution: Lines that Connect , 2006, Journal of the Air & Waste Management Association.

[8]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[9]  Jiaoyan Chen,et al.  Forecasting smog-related health hazard based on social media and physical sensor , 2016, Information Systems.

[10]  Yang Li,et al.  Variations in PM10, PM2.5 and PM1.0 in an Urban Area of the Sichuan Basin and Their Relation to Meteorological Factors , 2015, ATMOS 2015.

[11]  Zhao Wei,et al.  A comprehensive evaluation of air pollution prediction improvement by a machine learning method , 2015, 2015 IEEE International Conference on Service Operations And Logistics, And Informatics (SOLI).

[12]  Andrew P. Witkin,et al.  Uniqueness of the Gaussian Kernel for Scale-Space Filtering , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jiangshe Zhang,et al.  Prediction of Air Pollutants Concentration Based on an Extreme Learning Machine: The Case of Hong Kong , 2017, International journal of environmental research and public health.

[14]  Yves Rybarczyk,et al.  Machine learning approach to forecasting urban pollution , 2016, 2016 IEEE Ecuador Technical Chapters Meeting (ETCM).

[15]  Philippe Thunis,et al.  The impact of MM5 and WRF meteorology over complex terrain on CHIMERE model calculations , 2009 .

[16]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[17]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[18]  D. Dockery,et al.  Health Effects of Fine Particulate Air Pollution: Lines that Connect , 2006, Journal of the Air & Waste Management Association.

[19]  Pedro G. Lind,et al.  Air quality prediction using optimal neural networks with stochastic variables , 2013, 1307.3134.

[20]  J. Dudhia,et al.  Improving the representation of resolved and unresolved topographic effects on surface wind in the WRF model , 2012 .

[21]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[22]  Noel A Cressie,et al.  Some topics in convolution-based spatial modeling , 2007 .

[23]  Jacques Rivoirard,et al.  A Generalized Convolution Model and Estimation for Non-stationary Random Functions , 2014, 1412.1373.

[24]  G. Lemasters,et al.  Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches. , 2017, Atmospheric environment.

[25]  A. Osses,et al.  Forecasting urban PM10 and PM2.5 pollution episodes in very stable nocturnal conditions and complex terrain using WRF–Chem CO tracer model , 2011 .

[26]  Jimy Dudhia,et al.  On the Ability of the WRF Model to Reproduce the Surface Wind Direction over Complex Terrain , 2013 .

[27]  Ping Jiang,et al.  A novel hybrid strategy for PM2.5 concentration analysis and prediction. , 2017, Journal of environmental management.

[28]  Shikha Gupta,et al.  Identifying pollution sources and predicting urban air quality using ensemble learning methods , 2013 .

[29]  Minglei Fu,et al.  Prediction of particular matter concentrations by developed feed-forward neural network with rolling mechanism and gray model , 2015, Neural Computing and Applications.

[30]  Wei Sun,et al.  Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm. , 2017, Journal of environmental management.