Spatiotemporal prediction of continuous daily PM2.5 concentrations across China using a spatially explicit machine learning algorithm

Abstract A high degree of uncertainty associated with the emission inventory for China tends to degrade the performance of chemical transport models in predicting PM2.5 concentrations especially on a daily basis. In this study a novel machine learning algorithm, Geographically-Weighted Gradient Boosting Machine (GW-GBM), was developed by improving GBM through building spatial smoothing kernels to weigh the loss function. This modification addressed the spatial nonstationarity of the relationships between PM2.5 concentrations and predictor variables such as aerosol optical depth (AOD) and meteorological conditions. GW-GBM also overcame the estimation bias of PM2.5 concentrations due to missing AOD retrievals, and thus potentially improved subsequent exposure analyses. GW-GBM showed good performance in predicting daily PM2.5 concentrations (R2 = 0.76, RMSE = 23.0 μg/m3) even with partially missing AOD data, which was better than the original GBM model (R2 = 0.71, RMSE = 25.3 μg/m3). On the basis of the continuous spatiotemporal prediction of PM2.5 concentrations, it was predicted that 95% of the population lived in areas where the estimated annual mean PM2.5 concentration was higher than 35 μg/m3, and 45% of the population was exposed to PM2.5 >75 μg/m3 for over 100 days in 2014. GW-GBM accurately predicted continuous daily PM2.5 concentrations in China for assessing acute human health effects.

[1]  Jianjun He,et al.  Annual and diurnal variations of gaseous and particulate pollutants in 31 provincial capital cities based on in situ air quality monitoring data from China National Environmental Monitoring Center. , 2016, Environment international.

[2]  Michael Brauer,et al.  Addressing Global Mortality from Ambient PM2.5. , 2015, Environmental science & technology.

[3]  Yang Liu,et al.  Estimating ground-level PM2.5 in China using satellite remote sensing. , 2014, Environmental science & technology.

[4]  Daniel J. Jacob,et al.  Correlations between fine particulate matter (PM2.5) and meteorological variables in the United States: implications for the sensitivity of PM2.5 to climate change. , 2010 .

[5]  Who Europe Air Quality Guidelines Global Update 2005: Particulate Matter, ozone, nitrogen dioxide and sulfur dioxide , 2006 .

[6]  J. Lamarque,et al.  Multimodel ensemble simulations of present-day and near-future tropospheric ozone , 2006 .

[7]  Chunsheng Zhao,et al.  Characteristics of pollutants and their correlation to meteorological conditions at a suburban site in the North China Plain , 2011 .

[8]  D. Jacob,et al.  Mapping annual mean ground‐level PM2.5 concentrations using Multiangle Imaging Spectroradiometer aerosol optical thickness over the contiguous United States , 2004 .

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  A. Hedley,et al.  A method to derive the relationship between the annual and short-term air quality limits--analysis using the WHO Air Quality Guidelines for health protection. , 2013, Environment international.

[12]  M. Brauer,et al.  Global Estimates of Ambient Fine Particulate Matter Concentrations from Satellite-Based Aerosol Optical Depth: Development and Application , 2010, Environmental health perspectives.

[13]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[14]  Yong Xue,et al.  A consistent aerosol optical depth (AOD) dataset over mainland China by integration of several AOD products , 2015 .

[15]  Clayton V. Deutsch,et al.  GSLIB: Geostatistical Software Library and User's Guide , 1993 .

[16]  Bernadette A. Thomas,et al.  Global, regional, and national age–sex specific all-cause and cause-specific mortality for 240 causes of death, 1990–2013: a systematic analysis for the Global Burden of Disease Study 2013 , 2015, The Lancet.

[17]  L. Remer,et al.  The Collection 6 MODIS aerosol products over land and ocean , 2013 .

[18]  Yujie Wang,et al.  Multiangle implementation of atmospheric correction (MAIAC): 2. Aerosol algorithm , 2011 .

[19]  Yi Li,et al.  National-Scale Estimates of Ground-Level PM2.5 Concentration in China Using Geographically Weighted Regression Based on 3 km Resolution MODIS AOD , 2016, Remote. Sens..

[20]  G. Pershagen,et al.  Ambient air pollution exposure and cancer , 1997, Cancer Causes & Control.

[21]  X. Zhao,et al.  Analysis of a winter regional haze event and its formation mechanism in the North China Plain , 2013 .

[22]  Kebin He,et al.  Estimating long-term PM2.5 concentrations in China using satellite-based aerosol optical depth and a chemical transport model , 2015 .

[23]  Daniel Krewski,et al.  Lung Cancer and Cardiovascular Disease Mortality Associated with Ambient Air Pollution and Cigarette Smoke: Shape of the Exposure–Response Relationships , 2011, Environmental health perspectives.

[24]  Yuqi Bai,et al.  Daily Estimation of Ground-Level PM2.5 Concentrations over Beijing Using 3 km Resolution MODIS AOD. , 2015, Environmental science & technology.

[25]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[26]  Daniel J. Jacob,et al.  Introduction to Atmospheric Chemistry , 1999 .

[27]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[28]  Joel Schwartz,et al.  Chronic Fine and Coarse Particulate Exposure, Mortality, and Coronary Heart Disease in the Nurses’ Health Study , 2008, Environmental health perspectives.

[29]  D. Dockery,et al.  An association between air pollution and mortality in six U.S. cities. , 1993, The New England journal of medicine.

[30]  Armistead G Russell,et al.  Improving the Accuracy of Daily PM2.5 Distributions Derived from the Fusion of Ground-Level Measurements with Aerosol Optical Depth Observations, a Case Study in North China. , 2016, Environmental science & technology.

[31]  Michael J Jackson,et al.  Mechanism of the magnetic susceptibility enhancements of the Chinese loess , 2004 .

[32]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[33]  J. Friedman Stochastic gradient boosting , 2002 .

[34]  Michael Brauer,et al.  An Integrated Risk Function for Estimating the Global Burden of Disease Attributable to Ambient Fine Particulate Matter Exposure , 2014, Environmental health perspectives.

[35]  T. Woodruff,et al.  Differences in Birth Weight Associated with the 2008 Beijing Olympics Air Pollution Reduction: Results from a Natural Experiment , 2015, Environmental health perspectives.

[36]  M. Brauer,et al.  Use of Satellite Observations for Long-Term Exposure Assessment of Global Concentrations of Fine Particulate Matter , 2014, Environmental health perspectives.

[37]  D. Jacob,et al.  Global modeling of tropospheric chemistry with assimilated meteorology : Model description and evaluation , 2001 .

[38]  N. Cressie,et al.  Spatial Statistical Data Fusion for Remote Sensing Applications , 2012 .

[39]  R. Martin,et al.  Estimating ground-level PM2.5 using aerosol optical depth determined from satellite remote sensing , 2006 .

[40]  Yang Liu,et al.  Satellite-derived high resolution PM2.5 concentrations in Yangtze River Delta Region of China using improved linear mixed effects model , 2016 .

[41]  Joseph Frostad,et al.  Ambient Air Pollution Exposure Estimation for the Global Burden of Disease 2013. , 2016, Environmental science & technology.

[42]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[43]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[44]  Yujie Wang,et al.  Assessing PM2.5 Exposures with High Spatiotemporal Resolution across the Continental United States. , 2016, Environmental science & technology.

[45]  Wei Huang,et al.  Systematic review of Chinese studies of short-term exposure to air pollution and daily mortality. , 2013, Environment international.

[46]  G. Pfister,et al.  Spatiotemporal prediction of fine particulate matter during the 2008 northern California wildfires using machine learning. , 2015, Environmental science & technology.

[47]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[48]  A. Stewart Fotheringham,et al.  Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity , 2010 .

[49]  Yuming Guo,et al.  Projecting Fine Particulate Matter-Related Mortality in East China. , 2015, Environmental science & technology.

[50]  Johan Lindström,et al.  A Unified Spatiotemporal Modeling Approach for Predicting Concentrations of Multiple Air Pollutants in the Multi-Ethnic Study of Atherosclerosis and Air Pollution , 2014, Environmental health perspectives.

[51]  M. Brauer,et al.  Global Estimates of Fine Particulate Matter using a Combined Geophysical-Statistical Method with Information from Satellites, Models, and Monitors. , 2016, Environmental science & technology.

[52]  Yang Liu,et al.  Satellite-Based Spatiotemporal Trends in PM2.5 Concentrations: China, 2004–2013 , 2015, Environmental health perspectives.

[53]  W. You,et al.  Estimating national-scale ground-level PM25 concentration in China using geographically weighted regression based on MODIS and MISR AOD , 2016, Environmental Science and Pollution Research.

[54]  Yang Liu,et al.  Limitations of Remotely Sensed Aerosol as a Spatial Proxy for Fine Particulate Matter , 2009, Environmental health perspectives.

[55]  A. Cohen,et al.  Exposure assessment for estimation of the global burden of disease attributable to outdoor air pollution. , 2012, Environmental science & technology.

[56]  J. Schwartz,et al.  Assessing temporally and spatially resolved PM2.5 exposures for epidemiological studies using satellite aerosol optical depth measurements , 2011 .