Air Quality Forecast through Integrated Data Assimilation and Machine Learning

Numerical models of chemical transport have been used to simulate the complex processes involved in the formation and transport of air pollutants. Although these models can predict the spatiotemporal variability of a variety of chemical species, the accuracy of these models is often limited. Therefore, in the past two decades, data assimilation methods have been applied to use the available measurements for improving the forecast. Nowadays, machine learning techniques provide new opportunities for improving the air quality forecast. A case study on PM10 concentrations during a dust storm is performed. It is known that the PM10 concentrations are caused by multiple emission sources, e.g., dust from desert and anthropogenic emissions. An accurate modeling of the PM10 concentration levels owing to the local anthropogenic emissions is essential for an adequate evaluation of the dust level. However, real-time measurement of local emissions is not possible, so no direct data is available. Actually, the lack of in-time emission inventories is one of the main reasons that current numerical chemical transport models cannot produce accurate anthropogenic PM10 simulations. Using machine learning techniques to generate local emissions based on real-time observations is a promising approach. We report how it can be combined with data assimilation to improve the accuracy of air quality forecast considerably.

[1]  Peter Clark,et al.  Prediction of visibility and aerosol within the operational Met Office Unified Model. I: Model formulation and variational assimilation , 2008 .

[2]  Xiang Li,et al.  Deep learning architecture for air quality predictions , 2016, Environmental Science and Pollution Research.

[3]  Nagiza F. Samatova,et al.  Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[4]  Qiang Zhang,et al.  Source apportionment of PM2.5 across China using LOTOS-EUROS , 2017 .

[5]  Alessandro Vespignani,et al.  Supplementary Materials for The Parable of Google Flu: Traps in Big Data Analysis , 2014 .

[6]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[7]  Xiang Li,et al.  Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. , 2017, Environmental pollution.

[8]  J. Nathan Kutz,et al.  Machine learning and air quality modeling , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[9]  Wei Han,et al.  Visibility Forecast for Airport Operations by LSTM Neural Network , 2019, ICAART.

[10]  Anuj Karpatne,et al.  Physics Guided Recurrent Neural Networks For Modeling Dynamical Systems: Application to Monitoring Water Temperature And Quality In Lakes , 2018, ArXiv.

[11]  Qi Li,et al.  A Spatiotemporal Prediction Framework for Air Pollution Based on Deep RNN , 2017 .

[12]  B. Santer,et al.  Statistical significance of climate sensitivity predictors obtained by data mining , 2014 .

[13]  L. Knibbs,et al.  A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information. , 2018, The Science of the total environment.

[14]  Arnold Heemink,et al.  Spatially varying parameter estimation for dust emissions using reduced-tangent-linearization 4DVar , 2018, Atmospheric Environment.

[15]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .