How Important Is Satellite-Retrieved Aerosol Optical Depth in Deriving Surface PM2.5 Using Machine Learning?

PM2.5 refers to the total mass concentration of tiny particulates in the atmosphere near the surface, obtained by means of in situ observations and satellite remote sensing. Given the highly limited number of ground observation stations of inhomogeneous distribution and an ill-posed remote sensing approach, increasing efforts have been devoted to the application of machine-learning (ML) models to both ground and satellite data. A key satellite-derived parameter, aerosol optical thickness (AOD), has been most commonly used as a proxy of PM2.5, although their correlation is fraught with large uncertainties. A critical question that has been overlooked concerns how much AOD helps to improve the retrieval of PM2.5 relative to its uncertainty incurred concurrently. The question is addressed here by taking advantage of high-density PM2.5 stations in eastern China to evaluate the contributions of AOD, determined as the difference in the accuracy of PM2.5 retrievals with and without AOD for varying densities of PM2.5 stations, using four popular ML models (i.e., Random Forest, Extra-trees, XGBoost, and LightGBM). Our results reveal that as the density of monitoring stations decreases, both the feature importance and permutation importance of satellite AOD demonstrate a consistent upward trend (p < 0.05). Furthermore, the ML models without AOD exhibit faster declines in overall accuracy and predictive ability compared with the models with AOD assessed using the sample-based and station-based (spatial) independent cross-validation approaches. Overall, a 10% reduction in the number of stations results in an increase of 0.7–1.2% and 0.6–1.2% in uncertainty in estimated and predicted accuracies, respectively. These findings attest to the indispensable role of satellite AOD in the PM2.5 retrieval process through ML because it can significantly mitigate the negative impact of the sparse distribution of monitoring sites. This role becomes more important as the number of PM2.5 stations decreases.

[1]  G. Brasseur,et al.  Separating Daily 1 km PM2.5 Inorganic Chemical Composition in China since 2000 via Deep Learning Integrating Ground, Satellite, and Model Data , 2023, Environmental science & technology.

[2]  Zhanqing Li,et al.  Ground-level gaseous pollutants (NO2, SO2, and CO) in China: daily seamless mapping and spatiotemporal variations , 2023, Atmospheric Chemistry and Physics.

[3]  Xiong Liu,et al.  Ground-Level NO2 Surveillance from Space Across China for High Resolution Using Interpretable Spatiotemporally Weighted Artificial Intelligence , 2022, Environmental science & technology.

[4]  Yuming Guo,et al.  Deep Ensemble Machine Learning Framework for the Estimation of PM2.5 Concentrations , 2022, Environmental health perspectives.

[5]  Zhanqing Li,et al.  Himawari-8-derived diurnal variations in ground-level PM2.5 pollution across China using the fast space-time Light Gradient Boosting Machine (LightGBM) , 2021 .

[6]  Yuming Guo,et al.  The comparison of AOD-based and non-AOD prediction models for daily PM2.5 estimation in Guangdong province, China with poor AOD coverage. , 2021, Environmental research.

[7]  Zhanqing Li,et al.  The ChinaHighPM10 dataset: generation, validation, and spatiotemporal variations from 2015 to 2019 across China. , 2021, Environment international.

[8]  Jingzhe Wang,et al.  Estimating PM2.5 with high-resolution 1-km AOD data and an improved machine learning model over Shenzhen, China. , 2020, The Science of the total environment.

[9]  J. Thepaut,et al.  The ERA5 global reanalysis , 2020, Quarterly Journal of the Royal Meteorological Society.

[10]  Lin Sun,et al.  Improved 1 km resolution PM2.5 estimates across China using enhanced space–time extremely randomized trees , 2020 .

[11]  Markus Loecher,et al.  Unbiased variable importance for random forests , 2020, Communications in Statistics - Theory and Methods.

[12]  Chuanfeng Zhao,et al.  East Asian Study of Tropospheric Aerosols and their Impact on Regional Clouds, Precipitation, and Climate (EAST‐AIRCPC) , 2019, Journal of Geophysical Research: Atmospheres.

[13]  Zhanqing Li,et al.  Estimating 1-km-resolution PM2.5 concentrations across China using the space-time random forest approach , 2019, Remote Sensing of Environment.

[14]  Matthias Ketzel,et al.  A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide. , 2019, Environment international.

[15]  Zhanqing Li,et al.  Relationships between the planetary boundary layer height and surface pollutants derived from lidar observations over China: regional pattern and influencing factors , 2018, Atmospheric Chemistry and Physics.

[16]  Alexei Lyapustin,et al.  MODIS Collection 6 MAIAC algorithm , 2018, Atmospheric Measurement Techniques.

[17]  L. Knibbs,et al.  A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information. , 2018, The Science of the total environment.

[18]  Qingqing He,et al.  Satellite-based mapping of daily high-resolution ground PM 2.5 in China via space-time regression modeling , 2018 .

[19]  Bingyue Pan,et al.  Application of XGBoost algorithm in hourly PM2.5 concentration prediction , 2018 .

[20]  Meshari Al-Harbi,et al.  Allocating optimum sites for air quality monitoring stations using GIS suitability analysis , 2017, Urban Climate.

[21]  L. Knibbs,et al.  Development of a model for particulate matter pollution in Australia with implications for other satellite‐based models , 2017, Environmental research.

[22]  Qingyang Xiao,et al.  Full-coverage high-resolution daily PM 2.5 estimation using MAIAC AOD in the Yangtze River Delta of China , 2017 .

[23]  Jing He,et al.  Impact of diurnal variability and meteorological factors on the PM2.5 - AOD relationship: Implications for PM2.5 remote sensing. , 2017, Environmental pollution.

[24]  Xiaoping Liu,et al.  Satellite-based ground PM 2.5 estimation using timely structure adaptive modeling , 2016 .

[25]  Xiaoyan Ma,et al.  Can MODIS AOD be employed to derive PM2.5 in Beijing-Tianjin-Hebei over China? , 2016 .

[26]  Qilong Min,et al.  Remote sensing of ground-level PM2.5 combining AOD and backscattering profile , 2016 .

[27]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[28]  Jun Wang,et al.  Opposite seasonality of the aerosol optical depth and the surface particulate matter concentration over the north China Plain , 2016 .

[29]  W. You,et al.  Estimating national-scale ground-level PM25 concentration in China using geographically weighted regression based on MODIS and MISR AOD , 2016, Environmental Science and Pollution Research.

[30]  Yang Liu,et al.  Satellite-Based Spatiotemporal Trends in PM2.5 Concentrations: China, 2004–2013 , 2015, Environmental health perspectives.

[31]  Jan Duyzer,et al.  Representativeness of air quality monitoring networks , 2015 .

[32]  G. Pfister,et al.  Spatiotemporal prediction of fine particulate matter during the 2008 northern California wildfires using machine learning. , 2015, Environmental science & technology.

[33]  David John Lary,et al.  Estimating the global abundance of ground level presence of particulate matter (PM2.5). , 2014, Geospatial health.

[34]  Yang Liu,et al.  Estimating ground-level PM2.5 in China using satellite remote sensing. , 2014, Environmental science & technology.

[35]  J. Schwartz,et al.  Incorporating local land use regression and satellite aerosol optical depth in a hybrid model of spatiotemporal PM2.5 exposures in the Mid-Atlantic states. , 2012, Environmental science & technology.

[36]  A. Cohen,et al.  Exposure assessment for estimation of the global burden of disease attributable to outdoor air pollution. , 2012, Environmental science & technology.

[37]  J. Schwartz,et al.  A novel calibration approach of MODIS AOD data to predict PM2.5 concentrations , 2011 .

[38]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Naresh Kumar,et al.  Do GST Polymorphisms Modulate the Frequency of Chromosomal Aberrations in Healthy Subjects? , 2009, Environmental health perspectives.

[40]  P. Gupta,et al.  Particulate matter air quality assessment using integrated surface, satellite, and meteorological products: Multiple regression approach , 2009 .

[41]  R. Martin,et al.  Estimating ground-level PM2.5 using aerosol optical depth determined from satellite remote sensing , 2006 .

[42]  Jun Wang,et al.  Satellite remote sensing of particulate matter and air quality assessment over global cities , 2006 .

[43]  R. Koelemeijer,et al.  Comparison of spatial and temporal variations of aerosol optical thickness and particulate matter over Europe , 2006 .

[44]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[45]  D. Jacob,et al.  Estimating ground-level PM2.5 in the eastern United States using satellite remote sensing. , 2005, Environmental science & technology.

[46]  Xingfa Gu,et al.  Aerosol polarized phase function and single-scattering albedo retrieved from ground-based measurements , 2004 .

[47]  Jun Wang,et al.  Intercomparison between satellite‐derived aerosol optical thickness and PM2.5 mass: Implications for air quality studies , 2003 .

[48]  Hyunjoong Kim,et al.  Classification Trees With Unbiased Multiway Splits , 2001 .

[49]  Nengcheng Chen,et al.  PM2.5 Estimation and Spatial-Temporal Pattern Analysis Based on the Modified Support Vector Regression Model and the 1 km Resolution MAIAC AOD in Hubei, China , 2021, ISPRS Int. J. Geo Inf..

[50]  Zhanqing Li,et al.  Reconstructing 1-km-resolution high-quality PM2.5 data records from 2000 to 2018 in China: spatiotemporal variations and policy implications , 2021 .

[51]  C. Jang,et al.  A System for Developing and Projecting PM2.5 Spatial Fields to Correspond to Just Meeting National Ambient Air Quality Standards. , 2017, Atmospheric environment.

[52]  Yan Zhang,et al.  Estimating ground-level PM(10) in a Chinese city by combining satellite data, meteorological information and a land use regression model. , 2016, Environmental pollution.

[53]  K. Lehtinen,et al.  A multi-year comparison of PM2.5 and AOD for the Helsinki region , 2010 .

[54]  L. Breiman Random Forests , 2001, Machine Learning.