Reconstructing 6-hourly PM 2.5 datasets from 1960 to 1 2020 in China 2

16 Fine particulate matter (PM 2.5 ) has altered radiation balance on earth and raised environmental and 17 health risks for decades, but only been monitored widely since 2013 in China. Historical long-term 18 PM 2.5 records with high temporal resolution are essential but lacking for both research and 19 environmental management. Here, we reconstruct a site-based PM 2.5 dataset at 6-hour intervals from 20 1960 to 2020 that combines long-term visibility, conventional meteorological observations, 21 emissions, and elevation. The PM 2.5 concentration at each site is estimated based on an advanced 22 machine learning model, LightGBM, that takes advantage of spatial features from 20 surrounding 23 meteorological stations. Our model's performance is comparable or even better than those of 24 previous studies in by-year cross validation (CV) (R 2 =0.7) and spatial CV (R 2 =0.76), and more 25 advantageous in long-term records and high temporal resolution. This model also reconstructs a 26 0.25°×0.25°, 6-hourly, gridded PM 2.5 dataset by incorporating spatial features. The results show 27 PM 2.5 pollution worsens gradually or maintains before 2010 from an interdecadal scale but mitigates 28 in the following decade. Although the turning points vary in different regions, PM 2.5 mass 29 concentrations in key regions decreased significantly after 2013 due to clean air actions. In particular, 30 the annual average value of PM 2.5 in 2020 is nearly at the lowest value in history since 1960. These 31 two PM 2.5 datasets (publicly available at https://doi.org/10.5281/zenodo.6372847) provide 32 spatiotemporal variations at high resolution, which lay the foundation of research studies associated 33 with air pollution, climate change, and atmospheric chemical reanalysis. based on 674 publicly available 101 meteorological stations. Gui et al. (2020) constructed a virtual daily PM 2.5 network at 1180 102 meteorological sites between 2017-2018. Our previous research also shows that the visibility-based 103 machine learning model that takes advantage of spatial features has great potential in reconstructing 104 historical PM 2.5 datasets with long-term records and high temporal resolution (Zhong et al., 2021). 105 In this study, we reconstruct a site-based PM 2.5 dataset at 6-hour intervals from 1960 to 2020 based 106 on long-term visibility and conventional meteorological observations from ~2450 national stations, 107 together with emissions and elevation. The PM 2.5 concentration at each site is estimated based on a 108 Light Gradient Boosting Machine (LightGBM) model that takes advantage of spatial features from 109 20 surrounding meteorological stations. By incorporating spatial features, this model also 110 reconstructs a 0.25°×0.25°, 6-hourly, gridded PM 2.5 dataset. These two PM 2.5 datasets provide 111 spatiotemporal variations at high resolution, which constitute the basis for research studies 112 associated with air pollution, climate change, and atmospheric chemical reanalysis.

[1]  Zijiang Zhou,et al.  Reconstructing 6-hourly PM2.5 datasets from 1960 to 2020 in China , 2022, Earth System Science Data.

[2]  N. Chang,et al.  LGHAP: the Long-term Gap-free High-resolution Air Pollutant concentration dataset, derived via tensor-flow-based multimodal data fusion , 2022, Earth System Science Data.

[3]  Zhikui Chen,et al.  Multimodal Data Fusion , 2022, Honoring Professor Mohammad S. Obaidat.

[4]  M. Brauer,et al.  Monthly Global Estimates of Fine Particulate Matter and Their Uncertainty. , 2021, Environmental science & technology.

[5]  Q. Xiao,et al.  Tracking Air Pollution in China: Near Real-Time PM2.5 Retrievals from Multisource Data Fusion. , 2021, Environmental science & technology.

[6]  Jianlin Hu,et al.  High-Resolution Spatiotemporal Modeling for Ambient PM2.5 Exposure Assessment in China from 2013 to 2019. , 2021, Environmental science & technology.

[7]  Junying Sun,et al.  Robust prediction of hourly PM2.5 from meteorological data using LightGBM , 2021, National science review.

[8]  Zemin Wang,et al.  Construction of a virtual PM2.5 observation network in China based on high-density surface meteorological observations using the Extreme Gradient Boosting model. , 2020, Environment international.

[9]  Qingyang Xiao,et al.  An Ensemble Machine-Learning Model To Predict Historical PM2.5 Concentrations in China from Satellite Data. , 2018, Environmental science & technology.

[10]  Meng Li,et al.  Trends in China's anthropogenic emissions since 2010 as the consequence of clean air actions , 2018, Atmospheric Chemistry and Physics.

[11]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[12]  Miaomiao Liu,et al.  Visibility-Based PM2.5 Concentrations in China: 1957-1964 and 1973-2014. , 2017, Environmental science & technology.

[13]  Huarong Zhao,et al.  Relative contributions of boundary-layer meteorological factors to the explosive growth of PM2.5 during the red-alert heavy pollution episodes in Beijing in December 2016 , 2017, Journal of Meteorological Research.

[14]  Cheng Liu,et al.  Feedback effects of boundary-layer meteorological factors on cumulative explosive growth of PM 2.5 during winter heavy pollution episodes in Beijing from 2013 to 2016 , 2017 .

[15]  C. Flynn,et al.  The MERRA-2 Aerosol Reanalysis, 1980 - onward, Part I: System Description and Data Assimilation Evaluation. , 2017, Journal of climate.

[16]  Bin Zhao,et al.  The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). , 2017, Journal of climate.

[17]  Shu Tao,et al.  Modeling temporal variations in global residential energy consumption and pollutant emissions , 2016 .

[18]  Xiaoping Liu,et al.  Satellite-based ground PM 2.5 estimation using timely structure adaptive modeling , 2016 .

[19]  Jie Chen,et al.  Long-term exposure to urban air pollution and lung cancer mortality: A 12-year cohort study in Northern China. , 2016, The Science of the total environment.

[20]  F. Joseph Turk,et al.  An 11-year global gridded aerosol optical thickness reanalysis (v1.0) for atmospheric and climate sciences , 2016 .

[21]  P. Colarco,et al.  The MERRA-2 Aerosol Reanalysis , 2015 .

[22]  S. Tao,et al.  Global organic carbon emissions from primary sources from 1960 to 2009 , 2015 .

[23]  S. Tao,et al.  Quantification of global primary emissions of PM2.5, PM10, and TSP from combustion and industrial process sources. , 2014, Environmental science & technology.

[24]  Philippe Ciais,et al.  Trend in global black carbon emissions from 1960 to 2007. , 2014, Environmental science & technology.

[25]  G. Carmichael,et al.  Asian emissions in 2006 for the NASA INTEX-B mission , 2009 .

[26]  Bert Brunekreef,et al.  Long-Term Effects of Traffic-Related Air Pollution on Mortality in a Dutch Cohort (NLCS-AIR Study) , 2007, Environmental health perspectives.

[27]  Zijiang Zhou,et al.  Typical severe dust storms in northern China during 1954 —2002 , 2003 .