A data-driven approach to the forecasting of ground-level ozone concentration

The ability to forecast the concentration of air pollutants in an urban region is crucial for decision-makers wishing to reduce the impact of pollution on public health through active measures (e.g. temporary traffic closures). In this study, we present a machine learning approach applied to the forecast of the day-ahead maximum value of the ozone concentration for several geographical locations in southern Switzerland. Starting from a dataset containing thousands of historical air quality and weather data as well as numerical weather predictions, the most relevant features are selected using a genetic algorithm and then used to train a number of regression models. After assessing that forcing engineered features suggested by experts in the domain into the initial population of the genetic algorithm does not increase the final forecasters' accuracy, we adopted a procedure entirely agnostic for atmospheric physics. We then used Shapley values to explain the learned models in terms of feature importance and feature interactions in relation to ozone predictions. Our analysis suggests that the trained models effectively learned explanatory cross-dependencies among atmospheric variables, which are described in the ozone photochemistry literature.

[1]  T. Wallington,et al.  The Mechanisms of Reactions Influencing Atmospheric Ozone , 2015 .

[2]  Dezhi Sun,et al.  Ozone concentration forecast method based on genetic algorithm optimized back propagation neural networks and support vector machine data classification , 2011 .

[3]  S. Mohan,et al.  A novel bagging ensemble approach for predicting summertime ground-level ozone concentration , 2018, Journal of the Air & Waste Management Association.

[4]  Yunsoo Choi,et al.  A real-time hourly ozone prediction system using deep convolutional neural network , 2019, Neural Computing and Applications.

[5]  Lu Shen,et al.  Meteorology and Climate Influences on Tropospheric Ozone: a Review of Natural Sources, Chemistry, and Transport Patterns , 2019, Current Pollution Reports.

[6]  S. Osowski,et al.  Data mining methods for prediction of air pollution , 2016, Int. J. Appl. Math. Comput. Sci..

[7]  Matthew Kupilik,et al.  Spatio-temporal violent event prediction using Gaussian process regression , 2018, Journal of Computational Social Science.

[8]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[9]  G. Mills,et al.  Tropospheric ozone and its precursors from the urban to the global scale from air quality to short-lived climate forcer , 2014 .

[10]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[11]  Hossam Faris,et al.  Cycle reservoir with regular jumps for forecasting ozone concentrations: two real cases from the east of Croatia , 2018, Air Quality, Atmosphere & Health.

[12]  Petra Friederichs,et al.  Decomposition and graphical portrayal of the quantile score , 2014 .

[13]  E. Edirisinghe,et al.  Modelling ground-level ozone concentration using ensemble learning algorithms , 2015 .

[14]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[15]  Pedro A. Diaz-Gomez,et al.  Initial Population for Genetic Algorithms: A Metric Approach , 2007, GEM.

[16]  Philip H. Ramsey Nonparametric Statistical Methods , 1974, Technometrics.

[17]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[18]  Calculated Influence of Temperature-Related Factors on Ozone Formation Rates in the Lower Troposphere , 1995 .

[19]  H. Rue,et al.  Spatio-temporal modeling of particulate matter concentration through the SPDE approach , 2012, AStA Advances in Statistical Analysis.

[20]  Hugh Chen,et al.  From local explanations to global understanding with explainable AI for trees , 2020, Nature Machine Intelligence.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[23]  David E. Campbell,et al.  Linking Air Quality and Human Health Effects Models: An Application to the Los Angeles Air Basin , 2017, Environmental health insights.

[24]  Mark Lawrence,et al.  On the background photochemistry of tropospheric ozone , 1999 .

[25]  F. Keutsch,et al.  On the temperature dependence of organic reactivity, nitrogen oxides, ozone production, and the impact of emission controls in San Joaquin Valley, California , 2013 .

[26]  Yuqi Bai,et al.  Development of nonlinear empirical models to forecast daily PM2.5 and ozone levels in three large Chinese cities , 2016 .

[27]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[28]  Joaquín B. Ordieres Meré,et al.  Prediction of daily maximum ozone threshold exceedances by preprocessing and ensemble artificial intelligence techniques , 2016 .

[29]  H. Madsen,et al.  Reliability diagrams for non‐parametric density forecasts of continuous variables: Accounting for serial correlation , 2010 .

[30]  Rafael E. Carrillo,et al.  High-Resolution PV Forecasting from Imperfect Data: A Graph-Based Solution , 2020, Energies.

[31]  R. Cohen,et al.  Temperature and recent trends in the chemistry of continental surface ozone. , 2015, Chemical reviews.

[32]  Andrew Y. Ng,et al.  NGBoost: Natural Gradient Boosting for Probabilistic Prediction , 2019, ICML.