A kriging-calibrated machine learning method for estimating daily ground-level NO2 in mainland China.

It is unclear how to develop a model based on the combined satellite data and ground monitoring data to accurately estimate daily NO2 levels. Furthermore, the conventional cross-validation (CV) results represent average levels but the model performance may vary greatly from grid to grid. It is an essential issue to evaluate model's prediction ability in different grids and determine the factors affecting model extrapolating ability, which have never been well examined to date. The aim of this study was to compare the ability of three different methods to estimate the daily NO2 across mainland China during 2014-2016; and to develop a novel two-stage meta-analysis method for exploring the influence of the number and the distribution of nearby sites on grid-level prediction accuracy. For better estimating the daily NO2 level, we developed and compared three methods, including universal kriging model, satellite-based method (Non-linear exposure-lag-response model & Extreme gradient boosting combined technique) and the kriging-calibrated satellite method. For exploring influencing factors, the two-stage meta-analysis method was purposed. The kriging-calibrated satellite method had an overall CV R-square and root mean square error (RMSE) of 0.85 and 7.87μg/m3, better than the Universal Kriging model and the satellite-based method (CV R2 = 0.57 and 0.81). The two-stage meta-analysis method revealed that the model performance did decrease with the sparser distribution of nearby sites. And adding 5 sites within 50 km in the random mode can bring 17.51% improvement in model extrapolating ability. The kriging-calibration can help satellite-based machine learning to provide more accurate NO2 prediction. Our novel evaluation method can provide the suggestion of adding new sites effectively within a limit budget.

[1]  Yuqi Bai,et al.  National PM2.5 and NO2 exposure models for China based on land use regression, satellite measurements, and universal kriging. , 2018, The Science of the total environment.

[2]  Chris S. Elphick,et al.  Using Spatial Point-Pattern Assessment to Understand the Social and Environmental Mechanisms That Drive Avian Habitat Selection , 2010 .

[3]  Edzer Pebesma,et al.  Mapping of background air pollution at a fine spatial scale across the European Union. , 2009, The Science of the total environment.

[4]  Baofeng Di,et al.  Satellite-Based Estimates of Daily NO2 Exposure in China Using Hybrid Random Forest and Spatiotemporal Kriging Model. , 2018, Environmental science & technology.

[5]  Julian D Marshall,et al.  A national satellite-based land-use regression model for air pollution exposure assessment in Australia. , 2014, Environmental research.

[6]  L. Knibbs,et al.  A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information. , 2018, The Science of the total environment.

[7]  Julian D. Marshall,et al.  Remote sensing of exposure to NO2: Satellite versus ground-based measurement in a large urban area , 2013 .

[8]  Jorge Motta,et al.  Assessment of the Possible Association of Air Pollutants PM10, O3, NO2 With an Increase in Cardiovascular, Respiratory, and Diabetes Mortality in Panama City , 2016, Medicine.

[9]  Yuming Guo,et al.  Estimating PM2.5 concentrations based on non-linear exposure-lag-response associations with aerosol optical depth and meteorological measures , 2018 .

[10]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[11]  Qingqing He,et al.  Satellite-based mapping of daily high-resolution ground PM 2.5 in China via space-time regression modeling , 2018 .

[12]  R. Martin,et al.  Retrieving tropospheric nitrogen dioxide from the Ozone Monitoring Instrument: effects of aerosols, surface reflectance anisotropy, and vertical profile of nitrogen dioxide , 2013 .

[13]  R. Fernández-Patier,et al.  Estimation of personal NO2 exposure in a cohort of pregnant women. , 2009, The Science of the total environment.

[14]  Howard H. Chang,et al.  The sensitivity of satellite-based PM2.5 estimates to its inputs: Implications to model development in data-poor regions. , 2018, Environment international.

[15]  P. D. Hien,et al.  Influence of meteorological conditions on PM2.5 and PM2.5−10 concentrations during the monsoon season in Hanoi, Vietnam , 2002 .

[16]  X. Lee,et al.  Nitrous oxide emissions are enhanced in a warmer and wetter world , 2017, Proceedings of the National Academy of Sciences.

[17]  Anton Grafström,et al.  How to Select Representative Samples , 2014 .

[18]  P. J. Clark,et al.  Distance to Nearest Neighbor as a Measure of Spatial Relationships in Populations , 1954 .

[19]  Kazuhiko Ito,et al.  Characterization of PM2.5, gaseous pollutants, and meteorological interactions in the context of time-series health effects models , 2007, Journal of Exposure Science and Environmental Epidemiology.

[20]  Yan Zhang,et al.  A land use regression model for estimating the NO2 concentration in Shanghai, China. , 2015, Environmental research.

[21]  J. Marshall,et al.  National Spatiotemporal Exposure Surface for NO2: Monthly Scaling of a Satellite-Derived Land-Use Regression, 2000-2010. , 2015, Environmental science & technology.

[22]  Ahmad Tavassoli,et al.  Estimating the spatiotemporal variation of NO2 concentration using an adaptive neuro-fuzzy inference system , 2018, Environ. Model. Softw..

[23]  S T Holgate,et al.  Exposure to nitrogen dioxide (NO2) and respiratory disease risk. , 1998, Reviews on environmental health.

[24]  Jun Yang,et al.  Extreme gradient boosting model to estimate PM2.5 concentrations with missing-filled satellite data in China , 2019, Atmospheric Environment.

[25]  Altaf Arain,et al.  Spatial analysis of air pollution and childhood asthma in Hamilton, Canada: comparing exposure methods in sensitive subgroups , 2009, Environmental health : a global access science source.

[26]  Yang Liu,et al.  A statistical model to evaluate the effectiveness of PM2.5 emissions control during the Beijing 2008 Olympic Games. , 2012, Environment international.

[27]  Lianne Sheppard,et al.  Satellite-Based NO2 and Model Validation in a National Prediction Model Based on Universal Kriging and Land-Use Regression. , 2016, Environmental science & technology.

[28]  M. Shima,et al.  Spatiotemporal land use random forest model for estimating metropolitan NO2 exposure in Japan. , 2018, The Science of the total environment.

[29]  Liangfu Chen,et al.  Spatial and temporal evaluation of long term trend (2005-2014) of OMI retrieved NO 2 and SO 2 concentrations in Henan Province, China , 2017 .

[30]  F. Ballester,et al.  Preterm birth and exposure to air pollutants during pregnancy. , 2010, Environmental research.

[31]  Jian Xu,et al.  Estimating Ground Level NO2 Concentrations over Central-Eastern China Using a Satellite-Based Geographically and Temporally Weighted Regression Model , 2017, Remote. Sens..

[32]  H. Eskes,et al.  Global NO x emission estimates derived from an assimilation of OMI tropospheric NO 2 columns , 2011 .

[33]  David G. Streets,et al.  Aura OMI observations of regional SO2 and NO2 pollution changes from 2005 to 2015 , 2015 .

[34]  Steffen Beirle,et al.  MAX-DOAS measurements and satellite validation of tropospheric NO2 and SO2 vertical column densities at a rural site of North China , 2016 .

[35]  Mark Richards,et al.  A regionalized national universal kriging model using Partial Least Squares regression for estimating annual PM2.5 concentrations in epidemiology. , 2013, Atmospheric environment.