Improve ground-level PM2.5 concentration mapping using a random forests-based geostatistical approach.

Accurate measurements of ground-level PM2.5 (particulate matter with aerodynamic diameters equal to or less than 2.5 μm) concentrations are critically important to human and environmental health studies. In this regard, satellite-derived gridded PM2.5 datasets, particularly those datasets derived from chemical transport models (CTM), have demonstrated unique attractiveness in terms of their geographic and temporal coverage. The CTM-based approaches, however, often yield results with a coarse spatial resolution (typically at 0.1° of spatial resolution) and tend to ignore or simplify the impact of geographic and socioeconomic factors on PM2.5 concentrations. In this study, with a focus on the long-term PM2.5 distribution in the contiguous United States, we adopt a random forests-based geostatistical (regression kriging) approach to improve one of the most commonly used satellite-derived, gridded PM2.5 datasets with a refined spatial resolution (0.01°) and enhanced accuracy. By combining the random forests machine learning method and the kriging family of methods, the geostatistical approach effectively integrates ground-based PM2.5 measurements and related geographic variables while accounting for the non-linear interactions and the complex spatial dependence. The accuracy and advantages of the proposed approach are demonstrated by comparing the results with existing PM2.5 datasets. This manuscript also highlights the effectiveness of the geographical variables in long-term PM2.5 mapping, including brightness of nighttime lights, normalized difference vegetation index and elevation, and discusses the contribution of each of these variables to the spatial distribution of PM2.5 concentrations.

[1]  J. Schauer,et al.  Source apportionment of PM2.5 in the Southeastern United States using solvent-extractable organic compounds as tracers. , 2002, Environmental science & technology.

[2]  M. Brauer,et al.  Global Estimates of Ambient Fine Particulate Matter Concentrations from Satellite-Based Aerosol Optical Depth: Development and Application , 2010, Environmental health perspectives.

[3]  Daven K. Henze,et al.  Transient climate and ambient health impacts due to national solid fuel cookstove emissions , 2017, Proceedings of the National Academy of Sciences.

[4]  Princetonlaan LOTOS-EUROS v 2 . 0 Reference Guide , 2022 .

[5]  J. Gulliver,et al.  A review of land-use regression models to assess spatial variation of outdoor air pollution , 2008 .

[6]  Jun Wang,et al.  Intercomparison between satellite‐derived aerosol optical thickness and PM2.5 mass: Implications for air quality studies , 2003 .

[7]  Bruno Fabiano,et al.  Atmospheric Emissions from a Fossil Fuel Power Station: Dispersion Modelling and Experimental Comparison , 2014 .

[8]  Raymond M Hoff,et al.  Recommendations on the Use of Satellite Remote-Sensing Data for Urban Air Quality , 2004, Journal of the Air & Waste Management Association.

[9]  M. Gilbert,et al.  Using Random Forest to Improve the Downscaling of Global Livestock Census Data , 2016, PloS one.

[10]  Yang Liu,et al.  Estimating Regional Spatial and Temporal Variability of PM2.5 Concentrations Using Satellite Data, Meteorology, and Land Use Information , 2009, Environmental health perspectives.

[11]  Michael F. Goodchild,et al.  A multinomial logistic mixed model for the prediction of categorical spatial data , 2011, Int. J. Geogr. Inf. Sci..

[12]  A. Tatem,et al.  Dynamic population mapping using mobile phone data , 2014, Proceedings of the National Academy of Sciences.

[13]  Bertrand Michel,et al.  Correlation and variable importance in random forests , 2013, Statistics and Computing.

[14]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[15]  Yang Liu,et al.  Estimating ground-level PM2.5 in China using satellite remote sensing. , 2014, Environmental science & technology.

[16]  S. Fotheringham,et al.  Geographically Weighted Regression , 1998 .

[17]  J. Vanos,et al.  Effects of synoptic weather on ground-level PM 2.5 concentrations in the United States , 2017 .

[18]  Mikhail Zhizhin,et al.  A Fifteen Year Record of Global Natural Gas Flaring Derived from Satellite Data , 2009 .

[19]  R. Burnett,et al.  Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. , 2002, JAMA.

[20]  Wenzhong Shi,et al.  Approximate Area-to-Point Regression Kriging for Fast Hyperspectral Image Sharpening , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[21]  SUPPLEMENT: NON-SEPARABLE DYNAMIC NEAREST-NEIGHBOR GAUSSIAN PROCESS MODELS FOR LARGE SPATIO-TEMPORAL DATA WITH AN APPLICATION TO PARTICULATE MATTER ANALYSIS , 2016 .

[22]  B. Brunekreef,et al.  Epidemiological evidence of effects of coarse airborne particles on health , 2005, European Respiratory Journal.

[23]  M. Brauer,et al.  Use of Satellite Observations for Long-Term Exposure Assessment of Global Concentrations of Fine Particulate Matter , 2014, Environmental health perspectives.

[24]  Michael F. Goodchild,et al.  Combining spatial transition probabilities for stochastic simulation of categorical fields , 2011, Int. J. Geogr. Inf. Sci..

[25]  D. Jacob,et al.  Mapping annual mean ground‐level PM2.5 concentrations using Multiangle Imaging Spectroradiometer aerosol optical thickness over the contiguous United States , 2004 .

[26]  M. Vohland,et al.  Downscaling land surface temperatures at regional scales with random forest regression , 2016 .

[27]  Ahmet Palazoglu,et al.  Identification of weather patterns impacting 24-h average fine particulate matter pollution. , 2010 .

[28]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[29]  J. Chilès,et al.  Geostatistics: Modeling Spatial Uncertainty , 1999 .

[30]  R. Martin,et al.  Estimating ground-level PM2.5 using aerosol optical depth determined from satellite remote sensing , 2006 .

[31]  Suming Jin,et al.  Completion of the 2011 National Land Cover Database for the Conterminous United States – Representing a Decade of Land Cover Change Information , 2015 .

[32]  Joseph Frostad,et al.  Ambient Air Pollution Exposure Estimation for the Global Burden of Disease 2013. , 2016, Environmental science & technology.

[33]  J. Mohammadi,et al.  Spatial Estimation of Saturated Hydraulic Conductivity from Terrain Attributes Using Regression, Kriging, and Artificial Neural Networks , 2011 .

[34]  M. Brauer,et al.  Global Estimates of Fine Particulate Matter using a Combined Geophysical-Statistical Method with Information from Satellites, Models, and Monitors. , 2016, Environmental science & technology.

[35]  D. Roberts,et al.  Census from Heaven: An estimate of the global human population using night-time satellite imagery , 2001 .

[36]  Tomislav Hengl,et al.  A Practical Guide to Geostatistical Mapping , 2009 .

[37]  Richard T Burnett,et al.  High-Resolution Satellite-Derived PM2.5 from Optimal Estimation and Geographically Weighted Regression over North America. , 2015, Environmental science & technology.

[38]  Peter J. Diggle,et al.  Statistical Analysis of Spatial and Spatio-Temporal Point Patterns , 2013 .

[39]  C. Field,et al.  Relationships Between NDVI, Canopy Structure, and Photosynthesis in Three Californian Vegetation Types , 1995 .

[40]  D. Nowak,et al.  Tree and forest effects on air quality and human health in the United States. , 2014, Environmental pollution.

[41]  Neng Wan,et al.  Land Use Regression Modeling of PM2.5 Concentrations at Optimized Spatial Scales , 2016 .

[42]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[43]  Dar A. Roberts,et al.  A Comparison of Nighttime Satellite Imagery and Population Density for the Continental United States , 1997 .

[44]  D. Bruce,et al.  The use of night-time lights satellite imagery as a measure of Australia's regional electricity consumption and population distribution , 2010 .

[45]  P. Kyriakidis A Geostatistical Framework for Area-to-Point Spatial Interpolation , 2004 .

[46]  Bin Zou,et al.  High-Resolution Satellite Mapping of Fine Particulates Based on Geographically Weighted Regression , 2016, IEEE Geoscience and Remote Sensing Letters.

[47]  Bruce Denby,et al.  Comparison of two data assimilation methods for assessing PM10 exceedances on the European scale , 2008 .

[48]  Kebin He,et al.  Estimating long-term PM2.5 concentrations in China using satellite-based aerosol optical depth and a chemical transport model , 2015 .

[49]  B. H. Shahraki,et al.  Effect of Aqueous Film-Forming Foams on the Evaporation Rate of Hydrocarbon Fuels , 2013 .

[50]  Michael F. Goodchild,et al.  Statistical Perspectives on Geographic Information Science , 2008 .

[51]  J. Schwartz,et al.  A hybrid prediction model for PM2.5 mass and components using a chemical transport model and land use regression , 2016 .

[52]  Bin Zou,et al.  Satellite Based Mapping of Ground PM2.5 Concentration Using Generalized Additive Modeling , 2016, Remote. Sens..

[53]  J. Muller,et al.  The value of multiangle measurements for retrieving structurally and radiatively consistent properties of clouds, aerosols, and surfaces , 2005 .

[54]  Jin Huang,et al.  Enhanced Deep Blue aerosol retrieval algorithm: The second generation , 2013 .

[55]  J. C. Stevens,et al.  Air pollution removal by urban trees and shrubs in the United States , 2006 .

[56]  Bin Zou,et al.  An optimized spatial proximity model for fine particulate matter air pollution exposure assessment in areas of sparse monitoring , 2016, Int. J. Geogr. Inf. Sci..

[57]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[58]  Shamil Maksyutov,et al.  A very high-resolution (1 km×1 km) global fossil fuel CO2 emission inventory derived using a point source database and satellite observations of nighttime lights , 2011 .

[59]  Michael Brauer,et al.  Data integration model for air quality: a hierarchical approach to the global estimation of exposures to ambient air pollution , 2016, 1609.00141.

[60]  Achim Zeileis,et al.  Conditional variable importance for random forests , 2008, BMC Bioinformatics.

[61]  Wenzhong Shi,et al.  Area-to-point regression kriging for pan-sharpening , 2016 .

[62]  M. T. Melis,et al.  Air Quality Measurements at Multan, Pakistan , 2014 .

[63]  Kazuhiko Ito,et al.  A land use regression for predicting fine particulate matter concentrations in the New York City region , 2007 .

[64]  Shuliang Wang,et al.  Brightness of Nighttime Lights as a Proxy for Freight Traffic: A Case Study of China , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[65]  Alan D. Lopez,et al.  A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010 , 2012, The Lancet.

[66]  Jed O. Kaplan,et al.  Impacts of changes in land use and land cover on atmospheric chemistry and air quality over the 21st century , 2011 .

[67]  J. E. Cohen,et al.  Hypsographic demography: the distribution of human population by altitude. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[68]  David M. Winker,et al.  The CALIPSO mission: spaceborne lidar for observation of aerosols and clouds , 2003, SPIE Asia-Pacific Remote Sensing.

[69]  Mario Chica-Olmo,et al.  Downscaling Cokriging for Super-Resolution Mapping of Continua in Remotely Sensed Images , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[70]  Damien Sulla-Menashe,et al.  MODIS Collection 5 global land cover: Algorithm refinements and characterization of new datasets , 2010 .

[71]  Peter M. Atkinson,et al.  Downscaling in remote sensing , 2013, Int. J. Appl. Earth Obs. Geoinformation.

[72]  A. Cohen,et al.  Exposure assessment for estimation of the global burden of disease attributable to outdoor air pollution. , 2012, Environmental science & technology.

[73]  Kang-Tsung Chang,et al.  Introduction to Geographic Information Systems , 2001 .

[74]  Robert C. Levy,et al.  Optimal estimation for global ground‐level fine particulate matter concentrations , 2013 .

[75]  Alfred Stein,et al.  A spatially varying coefficient model for mapping PM10 air quality at the European scale , 2015 .

[76]  Catherine Linard,et al.  Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data , 2015, PloS one.

[77]  C. Elvidge,et al.  Spatial analysis of global urban extent from DMSP-OLS night lights , 2005 .

[78]  R. Martin,et al.  Fifteen-year global time series of satellite-derived fine particulate matter. , 2014, Environmental science & technology.

[79]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[80]  R. Davy,et al.  Statistical Downscaling of Wind Variability from Meteorological Fields , 2010 .

[81]  J. Townshend,et al.  NDVI-derived land cover classifications at a global scale , 1994 .

[82]  W. Nordhaus,et al.  Using luminosity data as a proxy for economic statistics , 2011, Proceedings of the National Academy of Sciences.

[83]  Naizhuo Zhao,et al.  Mapping spatio-temporal changes of Chinese electric power consumption using night-time imagery , 2012 .

[84]  R. Burnett,et al.  A New Method to Jointly Estimate the Mortality Risk of Long-Term Exposure to Fine Particulate Matter and its Components , 2016, Scientific Reports.

[85]  Gerard B. M. Heuvelink,et al.  About regression-kriging: From equations to case studies , 2007, Comput. Geosci..

[86]  Xiaoping Liu,et al.  Satellite-based ground PM 2.5 estimation using timely structure adaptive modeling , 2016 .

[87]  G. Heuvelink,et al.  Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions , 2015, PloS one.

[88]  Dong Jiang,et al.  Spatio-Temporal Variation of PM2.5 Concentrations and Their Relationship with Geographic and Socioeconomic Factors in China , 2013, International journal of environmental research and public health.

[89]  Wei Huang,et al.  Economic Conditions and Mortality: Evidence from 200 Years of Data , 2016 .