Hybridization of Air Quality Forecasting Models Using Machine Learning and Clustering: An Original Approach to Detect Pollutant Peaks

This paper presents an original approach combining Artificial Neural Networks (ANNs) and clustering in order to detect pollutant peaks. We developed air quality forecasting models using machine learning methods applied to hourly concentrations of ozone (O 3), nitrogen dioxide (NO 2) and particulate matter (PM 10) 24 hours ahead. MultiLayer Perceptron (MLP) was used alone, then hybridized successively with hierarchical clustering and with a combination of self-organizing map and k-means clustering. Clustering methods were used to subdivide the dataset, and then an MLP was trained on each subset. Two urban sites of Corsica Island in the western Mediterranean Sea were investigated. These models showed a good global precision (Index of Agreement reaching 0.87 for O 3 , 0.80 for NO 2 and 0.74 for PM 10). Considering it is particularly important than forecasting model used on an operational basis correctly predict pollution peaks, a sensitivity analysis was performed using Receiver Operating Characteristic curves (ROC curves). It allowed to evaluate the behaviour and the robustness of the models for high concentration situations. The results show that for PM 10 and O 3 , hybrid models made of a combination of clustering and MLP outperform classical MLP most of the time for high concentration prediction. An operational tool has been built with the models presented in this paper, and is used for air quality forecasting in Corsica.

[1]  Guoqiang Peter Zhang,et al.  Neural Networks for Time-Series Forecasting , 2012, Handbook of Natural Computing.

[2]  V. Masson,et al.  The AROME-France Convective-Scale Operational Model , 2011 .

[3]  Mikko Kolehmainen,et al.  Forecasting Air Quality Parameters Using Hybrid Neural Network Modelling , 2000 .

[4]  Jorge Reyes,et al.  Prediction of maximum of 24-h average of PM10 concentrations 30 h in advance in Santiago, Chile , 2002 .

[5]  P. Perez Combined model for PM10 forecasting in a large city , 2012 .

[6]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  A Auluck Improving learning , 2004, British Dental Journal.

[8]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[9]  Yang Zhang,et al.  Real-time air quality forecasting, part I: History, techniques, and current status , 2012 .

[10]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[11]  G. Notton,et al.  A Neural Network model forecasting for prediction of hourly ozone concentration in Corsica , 2011, 2011 10th International Conference on Environment and Electrical Engineering.

[12]  Hsin-Chung Lu,et al.  Prediction of daily maximum ozone concentrations from meteorological conditions using a two-stage neural network , 2006 .

[13]  William Remus,et al.  Neural Networks for Time-Series Forecasting , 2001 .

[14]  Joseph Rynkiewicz,et al.  A 24-h forecast of ozone peaks and exceedance levels using neural classifiers and weather predictions , 2007, Environ. Model. Softw..

[15]  Gilles Notton,et al.  Urban Ozone Concentration Forecasting with Artificial Neural Network in Corsica , 2013, ArXiv.

[16]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[17]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[18]  Gavin C. Cawley,et al.  Extensive evaluation of neural network models for the prediction of NO2 and PM10 concentrations, compared with a deterministic modelling system and measurements in central Helsinki , 2003 .

[19]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[20]  Jean-Michel Poggi,et al.  PM10 forecasting using clusterwise regression , 2011 .

[21]  Anastasia K Paschalidou,et al.  Forecasting hourly PM10 concentration in Cyprus through artificial neural networks and multiple regression models: implications to local environmental management , 2011, Environmental science and pollution research international.

[22]  K. L. Nielsen,et al.  An Algorithm for Least Squares , 1947 .

[23]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[24]  Gilles Foret,et al.  Combining deterministic and statistical approaches for PM10 forecasting in Europe , 2009 .

[25]  Giorgio Corani,et al.  Air quality prediction in Milan: feed-forward neural networks, pruned neural networks and lazy learning , 2005 .

[26]  C. Willmott Some Comments on the Evaluation of Model Performance , 1982 .

[27]  Yves Candau,et al.  Hourly ozone prediction for a 24-h horizon using neural networks , 2008, Environ. Model. Softw..

[28]  S. I. V. Sousa,et al.  Multiple linear regression and artificial neural networks based on principal components to predict ozone concentrations , 2007, Environ. Model. Softw..

[29]  R. Vautard,et al.  Aerosol modeling with CHIMERE—preliminary evaluation at the continental scale , 2004 .

[30]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.