Investigating the performance of satellite-based models in estimating the surface PM2.5 over China.

Accurate estimation of surface PM2.5 concentration is critical for the assessment of PM2.5 exposure and associated health impacts. Due to the limited spatial coverage of ground monitoring stations, most studies often use the satellite products to estimate surface PM2.5 concentration by constructing a comprehensive relationship between satellite-retrieved aerosol optical depth (AOD) and ground-based measured PM2.5 concentration with machine learning (ML) technologies. However, uncertainties of ML-based models may lead to considerable biases in PM2.5 estimation, which need carefully examined. Here we evaluate the accuracy of estimated PM2.5 concentration from two popular ML-models (i.e., Random Forest and the BP Neural Network) which were trained and tested using hourly data of satellite-retrieved AOD from HIMAWARI, ground-based measured PM2.5 from China National Environmental Monitoring Center, ERA5 meteorological conditions, and other auxiliary variables for a whole year of 2017 over China. We propose a new validation method considering the spatial pattern of the data during the validation. The results suggest that the traditional validation methods may overestimate the performance of the models on estimating the PM2.5 at the area with sparse in-situ measurements. Moreover, the spatial distribution pattern of the training data will largely affect the evaluation of models performance, which should be carefully considered. For future study, at least a site-specifically validation is needed rather than only using random sampling validation.

[1]  Yang Liu,et al.  Estimating Regional Spatial and Temporal Variability of PM2.5 Concentrations Using Satellite Data, Meteorology, and Land Use Information , 2009, Environmental health perspectives.

[2]  Yang Liu,et al.  Estimating ground-level PM2.5 in China using satellite remote sensing. , 2014, Environmental science & technology.

[3]  Hiroshi Murakami,et al.  Improved Hourly Estimates of Aerosol Optical Thickness Using Spatiotemporal Variability Derived From Himawari-8 Geostationary Satellite , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Zhengqiang Li,et al.  Remote sensing of atmospheric fine particulate matter (PM2.5) mass concentration near the ground from satellite observation , 2015 .

[5]  Qilong Min,et al.  Remote sensing of ground-level PM2.5 combining AOD and backscattering profile , 2016 .

[6]  Yang Liu,et al.  Satellite-Based Spatiotemporal Trends in PM2.5 Concentrations: China, 2004–2013 , 2015, Environmental health perspectives.

[7]  C. Sioutas,et al.  Particulate Air Pollution, Ambulatory Heart Rate Variability, and Cardiac Arrhythmia in Retirement Community Residents with Coronary Artery Disease , 2013, Environmental health perspectives.

[8]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[9]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Liang-pei Zhang,et al.  Estimating Regional Ground‐Level PM2.5 Directly From Satellite Top‐Of‐Atmosphere Reflectance Using Deep Belief Networks , 2017, Journal of Geophysical Research: Atmospheres.

[12]  Mathilde Pascal,et al.  Short-term impacts of particulate matter (PM10, PM10–2.5, PM2.5) on mortality in nine French cities , 2014 .

[13]  Daniel Rosenfeld,et al.  Aerosols and Their Impact on Radiation, Clouds, Precipitation, and Severe Weather Events , 2017 .

[14]  Jianjun Liu,et al.  Satellite-based PM2.5 estimation directly from reflectance at the top of the atmosphere using a machine learning algorithm , 2019, Atmospheric Environment.

[15]  Robert C. Levy,et al.  Optimal estimation for global ground‐level fine particulate matter concentrations , 2013 .

[16]  Qi Li,et al.  Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation , 2015 .

[17]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[18]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  J. H. Belle,et al.  Estimating PM2.5 Concentrations in the Conterminous United States Using the Random Forest Approach. , 2017, Environmental science & technology.

[20]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[21]  Zhanqing Li,et al.  Relationships between the planetary boundary layer height and surface pollutants derived from lidar observations over China: regional pattern and influencing factors , 2018, Atmospheric Chemistry and Physics.

[22]  Yuqi Bai,et al.  Daily Estimation of Ground-Level PM2.5 Concentrations over Beijing Using 3 km Resolution MODIS AOD. , 2015, Environmental science & technology.

[23]  Yu Zhan,et al.  Spatiotemporal prediction of continuous daily PM2.5 concentrations across China using a spatially explicit machine learning algorithm , 2017 .

[24]  R. Martin,et al.  Estimating ground-level PM2.5 using aerosol optical depth determined from satellite remote sensing , 2006 .

[25]  Validation approaches for satellite-based PM2.5 estimation: Assessment and a new approach , 2018, 1812.00135.

[26]  Zhanqing Li,et al.  Substantial contribution of anthropogenic air pollution to catastrophic floods in Southwest China , 2015 .

[27]  William L. Crosson,et al.  Estimating Ground-Level PM(sub 2.5) Concentrations in the Southeastern United States Using MAIAC AOD Retrievals and a Two-Stage Model , 2014 .

[28]  Youchuan Wan,et al.  Ground-level PM2.5 estimation over urban agglomerations in China with high spatiotemporal resolution based on Himawari-8. , 2019, The Science of the total environment.

[29]  Qilong Min,et al.  Remote sensing of PM2.5 during cloudy and nighttime periods using ceilometer backscatter , 2016 .

[30]  T. Jarmer,et al.  Night-Time Ground Hyperspectral Imaging for Urban-Scale Remote Sensing of Ambient PM. I. Aerosol Optical Thickness Acquisition , 2012 .

[31]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[32]  Liang-pei Zhang,et al.  Point-surface fusion of station measurements and satellite observations for mapping PM 2.5 distribution in China: Methods and assessment , 2016, 1607.02976.

[33]  C. O'Dowd,et al.  Flood or Drought: How Do Aerosols Affect Precipitation? , 2008, Science.

[34]  J. Fung,et al.  Using satellite remote sensing data to estimate the high-resolution distribution of ground-level PM2.5 , 2015 .

[35]  Delbert J Eatough,et al.  Measurement of light scattering in an urban area with a nephelometer and PM2.5 FDMS TEOM monitor: Accounting for the effect of water , 2013, Journal of the Air & Waste Management Association.

[36]  M. Greenstone,et al.  Evidence on the impact of sustained exposure to air pollution on life expectancy from China’s Huai River policy , 2013, Proceedings of the National Academy of Sciences.

[37]  Yan Yin,et al.  Aerosol and monsoon climate interactions over Asia , 2016 .

[38]  Zhanqing Li,et al.  Estimating 1-km-resolution PM2.5 concentrations across China using the space-time random forest approach , 2019, Remote Sensing of Environment.

[39]  Kebin He,et al.  Estimating long-term PM2.5 concentrations in China using satellite-based aerosol optical depth and a chemical transport model , 2015 .

[40]  Wei Wang,et al.  Estimation of spatiotemporal PM1.0 distributions in China by combining PM2.5 observations with satellite aerosol optical depth. , 2019, The Science of the total environment.

[41]  J. Lelieveld,et al.  The contribution of outdoor air pollution sources to premature mortality on a global scale , 2015, Nature.

[42]  Yongming Xu,et al.  Evaluation of machine learning techniques with multiple remote sensing datasets in estimating monthly concentrations of ground-level PM2.5. , 2018, Environmental pollution.

[43]  Jingfeng Huang,et al.  A satellite-based geographically weighted regression model for regional PM2.5 estimation over the Pearl River Delta region in China , 2014 .

[44]  D. Jacob,et al.  Mapping annual mean ground‐level PM2.5 concentrations using Multiangle Imaging Spectroradiometer aerosol optical thickness over the contiguous United States , 2004 .

[45]  Liangpei Zhang,et al.  Estimating Ground‐Level PM2.5 by Fusing Satellite and Station Observations: A Geo‐Intelligent Deep Learning Approach , 2017, 1707.03558.

[46]  M. Brauer,et al.  Global Estimates of Ambient Fine Particulate Matter Concentrations from Satellite-Based Aerosol Optical Depth: Development and Application , 2010, Environmental health perspectives.

[47]  L. Knibbs,et al.  A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information. , 2018, The Science of the total environment.

[48]  Jianjun Liu,et al.  First surface-based estimation of the aerosol indirect effect over a site in southeastern China , 2018, Advances in Atmospheric Sciences.

[49]  Youfei Zheng,et al.  Seasonal variations of aerosol optical properties, vertical distribution and associated radiative effects in the Yangtze Delta region of China , 2012 .