Deep learning spatiotemporal air pollution data in China using data fusion

An efficient and effective spatiotemporal prediction algorithm for PM 2.5 (i.e. particulate matter with a diameter of less than 2.5 micrometers) is urgently needed to study the distribution of PM 2.5 over a continuous spatiotemporal domain, which not only helps to make scientific decisions on the prevention and control of PM 2.5 pollution but also promotes meaningful assessment of the quantitative relationship between adverse health effects and PM 2.5 concentrations over time. Existing spatiotemporal interpolation algorithms are usually based on the assumption that interpolation models follow explicit and simple mathematical descriptions. Unfortunately, the real world does not really follow these perfect mathematical models. Combining data fusion techniques and a Long Short-Term Memory (LSTM) recurrent neural network (RNN), we present a novel spatiotemporal interpolation model, which is able to achieve high estimation accuracies over a long time period and a large area. By fusing the daily PM 2.5 data, meteorological data, elevation data, and land-use data collected from China in 2016, four experiments were conducted in this study to evaluate the efficiency and effectiveness of the proposed approach. Results showed that applying LSTM RNN on the fused dataset can achieve consistent and high accuracy in different geographies.

[1]  Cole Brokamp,et al.  Predicting Daily Urban Fine Particulate Matter Concentrations Using a Random Forest Model. , 2018, Environmental science & technology.

[2]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[3]  J. Chow,et al.  Size-segregated fine particle measurements by chemical species and their impact on visibility impairment in Denver☆ , 1991 .

[4]  J. Schwartz,et al.  The Effect of Fine and Coarse Particulate Air Pollution on Mortality: A National Analysis , 2009, Environmental health perspectives.

[5]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[6]  D. Chu,et al.  Improving National Air Quality Forecasts with Satellite Aerosol Observations , 2005 .

[7]  Runhe Shi,et al.  Ensemble and enhanced PM10 concentration forecast model based on stepwise regression and wavelet analysis , 2013 .

[8]  Qi Li,et al.  A Spatiotemporal Prediction Framework for Air Pollution Based on Deep RNN , 2017 .

[9]  Lixin Li,et al.  Deep learning PM2.5 concentrations with bidirectional LSTM RNN , 2019, Air Quality, Atmosphere & Health.

[10]  Yves Chauvin,et al.  Backpropagation: theory, architectures, and applications , 1995 .

[11]  D G Krige,et al.  A statistical approach to some mine valuation and allied problems on the Witwatersrand , 2015 .

[12]  U Gehring,et al.  Respiratory health and individual estimated exposure to traffic-related air pollutants in a cohort of young children , 2006, Occupational and Environmental Medicine.

[13]  John R Fieberg,et al.  Estimating Population Abundance Using Sightability Models: R SightabilityModel Package , 2012 .

[14]  Young Sung Ghim,et al.  Visibility Trends in Korea during the Past Two Decades , 2005, Journal of the Air & Waste Management Association.

[15]  G. Pfister,et al.  Spatiotemporal prediction of fine particulate matter during the 2008 northern California wildfires using machine learning. , 2015, Environmental science & technology.

[16]  Patricia A Stewart,et al.  The Diesel Exhaust in Miners study: a cohort mortality study with emphasis on lung cancer. , 2012, Journal of the National Cancer Institute.

[17]  E. G. Zurflueh,et al.  Applications of two-dimensional linear wavelength filtering , 1967 .

[18]  D. Shepard A two-dimensional interpolation function for irregularly-spaced data , 1968, ACM National Conference.

[19]  Jin Li,et al.  A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors , 2011, Ecol. Informatics.

[20]  M. Brauer,et al.  Long-term Residential Exposure to Air Pollution and Lung Cancer Risk , 2013, Epidemiology.

[21]  Harold McInnes,et al.  Modelling long-term averages of local ambient air pollution in Oslo, Norway: evaluation of nitrogen dioxide, PM10 and PM2.5 , 2009 .

[22]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[23]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Travis Losser,et al.  Fast Inverse Distance Weighting-Based Spatiotemporal Interpolation: A Web-Based Application of Interpolating Daily Fine Particulate Matter PM2.5 in the Contiguous U.S. Using Parallel Programming and k-d Tree , 2014, International journal of environmental research and public health.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  J. H. Belle,et al.  Estimating PM2.5 Concentrations in the Conterminous United States Using the Random Forest Approach. , 2017, Environmental science & technology.

[28]  Jason Franklin,et al.  Efficient spatiotemporal interpolation with spark machine learning , 2018, Earth Science Informatics.

[29]  M KATZ,et al.  AIR POLLUTION AND LUNG CANCER. , 1964, Medical services journal, Canada.

[30]  Lixin Li,et al.  Learning Air Pollution with Bidirectional LSTM RNN , 2018 .

[31]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  D. Christiani,et al.  Effects of air pollutants on acute stroke mortality. , 2002, Environmental health perspectives.

[33]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[34]  Roy M Harrison,et al.  Fine (PM2.5) and coarse (PM2.5-10) particulate matter on a heavily trafficked London highway: sources and processes. , 2005, Environmental science & technology.

[35]  Jie Tian,et al.  Estimating Population Exposure to Fine Particulate Matter in the Conterminous U.S. using Shape Function-based Spatiotemporal Interpolation Method: A County Level Analysis. , 2012, GSTF international journal on computing.

[36]  Xin Fang,et al.  Spatial modeling of PM2.5 concentrations with a multifactoral radial basis function neural network , 2015, Environmental Science and Pollution Research.

[37]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[38]  Lixin Li,et al.  Interpolation methods for spatio-temporal geographic data , 2004, Comput. Environ. Urban Syst..

[39]  Lixin Li,et al.  A Spatiotemporal Interpolation Method Using Radial Basis Functions for Geospatiotemporal Big Data , 2014, 2014 Fifth International Conference on Computing for Geospatial Research and Application.

[40]  Geir Aamodt,et al.  Relation between concentration of air pollution and cause-specific mortality: four-year exposures to nitrogen dioxide and particulate matter pollutants in 470 neighborhoods in Oslo, Norway. , 2007, American journal of epidemiology.

[41]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[42]  D. Dockery,et al.  An association between air pollution and mortality in six U.S. cities. , 1993, The New England journal of medicine.

[43]  Molin Wang,et al.  Particulate Matter Air Pollution Exposure, Distance to Road, and Incident Lung Cancer in the Nurses’ Health Study Cohort , 2014, Environmental health perspectives.

[44]  L. Knibbs,et al.  A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information. , 2018, The Science of the total environment.

[45]  Long Chen FINITE ELEMENT METHOD , 2013 .

[46]  G. Heiss,et al.  GIS APPROACHES FOR ESTIMATION OF RESIDENTIAL-LEVEL AMBIENT PM CONCENTRATIONS , 2005, Environmental health perspectives.

[47]  Bin Zou,et al.  Satellite Based Mapping of Ground PM2.5 Concentration Using Generalized Additive Modeling , 2016, Remote. Sens..

[48]  Edzer Pebesma,et al.  spacetime: Spatio-Temporal Data in R , 2012 .

[49]  Bert Brunekreef,et al.  Long-Term Exposure to Traffic-Related Air Pollution and Lung Cancer Risk , 2008, Epidemiology.

[50]  Donato Malerba,et al.  Using trend clusters for spatiotemporal interpolation of missing data in a sensor network , 2013, J. Spatial Inf. Sci..

[51]  Zhongfei Zhang,et al.  Deep Air Learning: Interpolation, Prediction, and Feature Analysis of Fine-Grained Air Quality , 2017, IEEE Transactions on Knowledge and Data Engineering.

[52]  Guojie Song,et al.  A Deep Spatial-Temporal Ensemble Model for Air Quality Prediction , 2018, Neurocomputing.

[53]  B. Brunekreef,et al.  Air pollution and lung cancer incidence in 17 European cohorts: prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE). , 2013, The Lancet. Oncology.

[54]  R. Burnett,et al.  Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. , 2002, JAMA.

[55]  P. Gupta,et al.  Particulate Matter Air Quality Assessment using Integrated Surface, Satellite, and Meteorological Products , 2009 .