Predicting air quality with deep learning LSTM: Towards comprehensive models

Abstract In this paper we approach the problem of predicting air quality in the region of Madrid using long short term memory recurrent artificial neural networks. Air quality, in this study, is represented by the concentrations of a series of air pollutants which are proved as risky for human health such as CO, NO2, O3, PM10, SO2 and airborne pollen concentrations of two genus (Plantago and Poaceae). These concentrations are sampled in a set of locations in the city of Madrid. Instead of training an array of models, one per location and pollutant, several comprehensive deep network configurations are compared to identify those which are able to better extract relevant information out of the set of time series in order to predict one day-ahead air quality. The results, supported by statistical evidence, indicate that a single comprehensive model might be a better option than multiple individual models. Such comprehensive models represent a successful tool which can provide useful forecasts that can be thus applied, for example, in managerial environments by clinical institutions to optimize resources in expectation of an increment of the number of patients due to the exposure to low air quality levels.

[1]  Ricardo Navares,et al.  Predicting the Poaceae pollen season: six month-ahead forecasting and identification of relevant features , 2017, International Journal of Biometeorology.

[2]  M. Castellano-Méndez,et al.  Artificial neural networks as a useful tool to predict the risk level of Betula pollen in the air , 2005, International journal of biometeorology.

[3]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[4]  Matt Smith,et al.  A 30-day-ahead forecast model for grass pollen in north London, United Kingdom , 2006, International journal of biometeorology.

[5]  T. Andersen A model to predict the beginning of the pollen season , 1991 .

[6]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[7]  Summary and findings of the EPA and CDC symposium on air pollution exposure and health , 2009, Journal of Exposure Science and Environmental Epidemiology.

[8]  Georgios Grivas,et al.  Artificial neural network models for prediction of PM10 hourly concentrations, in the Greater Area of Athens, Greece , 2006 .

[9]  P. Cuesta,et al.  Models for forecasting airborne Cupressaceae pollen levels in central Spain , 2012, International Journal of Biometeorology.

[10]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[11]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[12]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[13]  Archontoula Chaloulakou,et al.  Comparative assessment of neural networks and regression models for forecasting summertime ozone in Athens. , 2003, The Science of the total environment.

[14]  S. Fernández-Rodríguez,et al.  Temporal modelling and forecasting of the airborne pollen of Cupressaceae on the southwestern Iberian Peninsula , 2016, International Journal of Biometeorology.

[15]  S. Fernández-Rodríguez,et al.  Regional forecast model for the Olea pollen season in Extremadura (SW Spain) , 2016, International Journal of Biometeorology.

[16]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[17]  M. Cannell,et al.  Thermal time, chill days and prediction of budburst in Picea sitchensis , 1983 .

[18]  J. Díaz,et al.  Relationship between atmospheric pressure and mortality in the Madrid Autonomous Region: a time-series study , 2001, International journal of biometeorology.

[19]  J. Mejuto,et al.  A model to forecast the risk periods of Plantago pollen allergy by using the ANN methodology , 2015, Aerobiologia.

[20]  Gad Abraham,et al.  Short-Term Forecasting of Emergency Inpatient Flow , 2009, IEEE Transactions on Information Technology in Biomedicine.

[21]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[22]  E. Subiza,et al.  Allergenic pollen and pollinosis in Madrid , 1995 .

[23]  Ricardo Navares,et al.  What are the most important variables for Poaceae airborne pollen forecasting? , 2017, The Science of the total environment.

[24]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[25]  A. Tobías,et al.  Short-term effects of pollen species on hospital admissions in the city of Madrid in terms of specific causes and age , 2007 .

[26]  Sumit Sharma,et al.  Statistical behavior of ozone in urban environment , 2016 .

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  Rob J. Hyndman,et al.  A note on the validity of cross-validation for evaluating autoregressive time series prediction , 2018, Comput. Stat. Data Anal..

[30]  J. Díaz,et al.  Impact of high temperatures on hospital admissions: comparative analysis with previous studies about mortality (Madrid). , 2008, European journal of public health.

[31]  Ricardo Navares,et al.  Comparing ARIMA and computational intelligence methods to forecast daily hospital admissions due to circulatory and respiratory causes in Madrid , 2018, Stochastic Environmental Research and Risk Assessment.

[32]  Margaret Bell,et al.  Improving the prediction of air pollution peak episodes generated by urban transport networks , 2016 .

[33]  Jörg Schaber,et al.  Physiology-based phenology models for forest tree species in Germany , 2003, International journal of biometeorology.

[34]  Asha B. Chelani,et al.  Prediction of sulphur dioxide concentration using artificial neural networks , 2002, Environ. Model. Softw..