PM2.5 Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data

In recent years, air pollution has become an important public health concern. The high concentration of fine particulate matter with diameter less than 2.5 µm (PM2.5) is known to be associated with lung cancer, cardiovascular disease, respiratory disease, and metabolic disease. Predicting PM2.5 concentrations can help governments warn people at high risk, thus mitigating the complications. Although attempts have been made to predict PM2.5 concentrations, the factors influencing PM2.5 prediction have not been investigated. In this work, we study feature importance for PM2.5 prediction in Tehran’s urban area, implementing random forest, extreme gradient boosting, and deep learning machine learning (ML) approaches. We use 23 features, including satellite and meteorological data, ground-measured PM2.5, and geographical data, in the modeling. The best model performance obtained was R2 = 0.81 (R = 0.9), MAE = 9.93 µg/m3, and RMSE = 13.58 µg/m3 using the XGBoost approach, incorporating elimination of unimportant features. However, all three ML methods performed similarly and R2 varied from 0.63 to 0.67, when Aerosol Optical Depth (AOD) at 3 km resolution was included, and 0.77 to 0.81, when AOD at 3 km resolution was excluded. Contrary to the PM2.5 lag data, satellite-derived AODs did not improve model performance.

[1]  Qi Li,et al.  A hybrid model for spatiotemporal forecasting of PM2.5 based on graph convolutional neural network and long short-term memory. , 2019, The Science of the total environment.

[2]  Ying Zhang,et al.  Satellite-based estimation of regional particulate matter (PM) in Beijing using vertical-and-RH correcting method , 2010 .

[3]  V. Hosseini,et al.  Seasonal trends, chemical speciation and source apportionment of fine PM in Tehran , 2017 .

[4]  V. Hosseini,et al.  Seasonal variations in the oxidative stress and inflammatory potential of PM2.5 in Tehran using an alveolar macrophage model; The role of chemical composition and sources. , 2019, Environment international.

[5]  L. Haimberger,et al.  Assessing PM2.5 concentrations in Tehran, Iran, from space using MAIAC, deep blue, and dark target AOD and machine learning algorithms , 2019, Atmospheric Pollution Research.

[6]  B. Brunekreef,et al.  Air pollution and health , 2002, The Lancet.

[7]  Li-Chiu Chang,et al.  Explore a deep learning multi-output neural network for regional multi-step-ahead air quality forecasts , 2019, Journal of Cleaner Production.

[8]  Ping-Huan Kuo,et al.  A Deep CNN-LSTM Model for Particulate Matter (PM2.5) Forecasting in Smart Cities , 2018, Sensors.

[9]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[10]  Congcong Wen,et al.  A novel spatiotemporal convolutional long short-term neural network for air pollution prediction. , 2019, The Science of the total environment.

[11]  Bo Zhang,et al.  A Novel Combined Prediction Scheme Based on CNN and LSTM for Urban PM2.5 Concentration , 2019, IEEE Access.

[12]  E. Alzate Modelos de mezclas Bernoulli con regresión logística: una aplicación en la valoración de carteras de crédito , 2020 .

[13]  J. Schwartz,et al.  Reduction in fine particulate air pollution and mortality: Extended follow-up of the Harvard Six Cities study. , 2006, American journal of respiratory and critical care medicine.

[14]  P. Hopke,et al.  Estimation of Mortality and Hospital Admissions Attributed to Criteria Air Pollutants in Tehran Metropolis, Iran (2013–2016) , 2017 .

[15]  J. Balmes,et al.  Outdoor air pollution and asthma , 2014, The Lancet.

[16]  Fuji Chen,et al.  Meteorological pattern analysis assisted daily PM2.5 grades prediction using SVM optimized by PSO algorithm , 2019, Atmospheric Pollution Research.

[17]  Mohammad Sadegh Hassanvand,et al.  Long-term trends and health impact of PM2.5 and O3 in Tehran, Iran, 2006-2015. , 2018, Environment international.

[18]  H. Akimoto Global Air Quality and Pollution , 2003, Science.

[19]  Michael Schukat,et al.  Deep Reinforcement Learning: An Overview , 2016, IntelliSys.

[20]  Mohammad Sadegh Hassanvand,et al.  Source-specific lung cancer risk assessment of ambient PM2.5-bound polycyclic aromatic hydrocarbons (PAHs) in central Tehran. , 2018, Environment international.

[21]  L. Friberg [Air pollution]. , 1984, Svenska lakartidningen.

[22]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[23]  Xiliang Ni,et al.  Spatio-Temporal Pattern Estimation of PM2.5 in Beijing-Tianjin-Hebei Region Based on MODIS AOD and Meteorological Data Using the Back Propagation Neural Network , 2018 .

[24]  Executive Summary World Urbanization Prospects: The 2018 Revision , 2019 .

[25]  Yang Liu,et al.  Estimating ground-level PM2.5 in China using satellite remote sensing. , 2014, Environmental science & technology.

[26]  A. Jafari,et al.  Effects of meteorological variables and holidays on the concentrations of PM10, PM2.5, O3, NO2, SO2, and CO in Tehran (2014-2018) , 2019, Journal of Air Pollution and Health.

[27]  Kebin He,et al.  Estimating long-term PM2.5 concentrations in China using satellite-based aerosol optical depth and a chemical transport model , 2015 .

[28]  J. C. Stevens,et al.  Air pollution removal by urban trees and shrubs in the United States , 2006 .

[29]  Basil W. Coutant,et al.  Qualitative and quantitative evaluation of MODIS satellite sensor data for regional and urban scale air quality , 2004 .

[30]  Jun Wang,et al.  Satellite remote sensing of particulate matter and air quality assessment over global cities , 2006 .

[31]  M J Nieuwenhuijsen,et al.  Health impact assessment of increasing public transport and cycling use in Barcelona: a morbidity and burden of disease approach. , 2013, Preventive medicine.

[32]  M. Brauer,et al.  Global Estimates of Ambient Fine Particulate Matter Concentrations from Satellite-Based Aerosol Optical Depth: Development and Application , 2010, Environmental health perspectives.

[33]  Mohammad Hassan Ehrampoush,et al.  Meteorological correlates and AirQ+ health risk assessment of ambient fine particulate matter in Tehran, Iran , 2019, Environmental research.

[34]  V. Hosseini,et al.  Seasonal trends in the composition and sources of PM2.5 and carbonaceous aerosol in Tehran, Iran. , 2018, Environmental pollution.

[35]  Lin Sun,et al.  Estimating PM2.5 Concentrations Based on MODIS AOD and NAQPMS Data over Beijing–Tianjin–Hebei , 2019, Sensors.

[36]  Kurt Fedra,et al.  A Novel Method for Improving Air Pollution Prediction Based on Machine Learning Approaches: A Case Study Applied to the Capital City of Tehran , 2019, ISPRS Int. J. Geo Inf..

[37]  Jianqiang He,et al.  A novel model for hourly PM2.5 concentration prediction based on CART and EELM. , 2019, The Science of the total environment.

[38]  Jalal Karami,et al.  TEHRAN AIR POLLUTANTS PREDICTION BASED ON RANDOM FOREST FEATURE SELECTION METHOD , 2017 .

[39]  Narges Khanjani,et al.  The relation between air pollution and respiratory deaths in Tehran, Iran- using generalized additive models , 2018, BMC Pulmonary Medicine.

[40]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[41]  Vahid Hosseini,et al.  A novel regression imputation framework for Tehran air pollution monitoring network using outputs from WRF and CAMx models , 2018, Atmospheric Environment.

[42]  Mahavir Singh,et al.  Validation of MODIS retrieval aerosol optical depth and an investigation of aerosol transport over Mohal in north western Indian Himalaya , 2012 .

[43]  Lixin Li,et al.  Deep learning PM2.5 concentrations with bidirectional LSTM RNN , 2019, Air Quality, Atmosphere & Health.

[44]  Liang-pei Zhang,et al.  Estimating Regional Ground‐Level PM2.5 Directly From Satellite Top‐Of‐Atmosphere Reflectance Using Deep Belief Networks , 2017, Journal of Geophysical Research: Atmospheres.

[45]  Daniel Krewski,et al.  Estimates of global mortality attributable to particulate air pollution using satellite imagery. , 2013, Environmental research.