PM10 forecasting using clusterwise regression

Abstract In this paper, we are interested in the statistical forecasting of the daily mean PM10 concentration. Hourly concentrations of PM10 have been measured in the city of Rouen, in Haute-Normandie, France. Located at northwest of Paris, near the south side of Manche sea and heavily industrialised. We consider three monitoring stations reflecting the diversity of situations: an urban background station, a traffic station and an industrial station near the cereal harbour of Rouen. We have focused our attention on data for the months that register higher values, from December to March, on years 2004–2009. The models are obtained from the winter days of the four seasons 2004/2005 to 2007/2008 (training data) and then the forecasting performance is evaluated on the winter days of the season 2008/2009 (test data). We show that it is possible to accurately forecast the daily mean concentration by fitting a function of meteorological predictors and the average concentration measured on the previous day. The values of observed meteorological variables are used for fitting the models and are also considered for the test data. We have compared the forecasts produced by three different methods: persistence, generalized additive nonlinear models and clusterwise linear regression models. This last method gives very impressive results and the end of the paper tries to analyze the reasons of such a good behavior.

[1]  Siegfried Hörmann,et al.  Quality and performance of a PM10 daily forecasting model , 2008 .

[2]  Archontoula Chaloulakou,et al.  Comparative assessment of neural networks and regression models for forecasting summertime ozone in Athens. , 2003, The Science of the total environment.

[3]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[4]  Daniel S. Wilks,et al.  Statistical Methods in the Atmospheric Sciences: An Introduction , 1995 .

[5]  G. Gennaro,et al.  A Simple Feedforward Neural Network for the PM10 Forecasting: Comparison with a Radial Basis Function Network and a Multivariate Linear Regression Model , 2009 .

[6]  W. Geoffrey Cobourn,et al.  An enhanced PM2.5 air quality forecast model based on nonlinear regression and back-trajectory concentrations , 2010 .

[7]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[8]  Ali Zolghadri,et al.  Adaptive nonlinear state-space modelling for the prediction of daily mean PM10 concentrations , 2006, Environ. Model. Softw..

[9]  Friedrich Leisch,et al.  Fitting finite mixtures of generalized linear regressions in R , 2007, Comput. Stat. Data Anal..

[10]  Anastasia K Paschalidou,et al.  Forecasting hourly PM10 concentration in Cyprus through artificial neural networks and multiple regression models: implications to local environmental management , 2011, Environmental science and pollution research international.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  J. Hooyberghs,et al.  A neural network forecast for daily average PM10 concentrations in Belgium , 2005 .

[13]  F. Leisch,et al.  FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters , 2008 .

[14]  Dong Yang,et al.  PM2.5 concentration prediction using hidden semi-Markov model-based times series data mining , 2009, Expert Syst. Appl..

[15]  Georgios Grivas,et al.  Artificial neural network models for prediction of PM10 hourly concentrations, in the Greater Area of Athens, Greece , 2006 .

[16]  Jean-Michel Poggi,et al.  Quantifying local and background contributions to PM10 concentrations in Haute‐Normandie, using random forests , 2011 .

[17]  I. Barmpadimos,et al.  Influence of meteorology on PM 10 trends and variability in Switzerland from 1991 to 2008 , 2010 .

[18]  Ingrid Hobæk Haff,et al.  Generalised additive modelling of air pollution, traffic volume and meteorology , 2005 .

[19]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[20]  Giorgio Corani,et al.  Air quality prediction in Milan: feed-forward neural networks, pruned neural networks and lazy learning , 2005 .

[21]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[22]  Nicolas Moussiopoulos,et al.  PM10 forecasting for Thessaloniki, Greece , 2006, Environ. Model. Softw..

[23]  J. Chow,et al.  A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: The case of Temuco, Chile , 2008 .

[24]  Athanasios Sfetsos,et al.  Time Series Forecasting of Hourly PM10 Using Localized Linear Models , 2010, J. Softw. Eng. Appl..

[25]  F. Leisch FlexMix: A general framework for finite mixture models and latent class regression in R , 2004 .

[26]  Gilles Foret,et al.  Combining deterministic and statistical approaches for PM10 forecasting in Europe , 2009 .

[27]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[28]  Archontoula Chaloulakou,et al.  Neural Network and Multiple Regression Models for PM10 Prediction in Athens: A Comparative Assessment , 2003, Journal of the Air & Waste Management Association.

[29]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[30]  Jorge Reyes,et al.  An integrated neural network model for PM10 forecasting , 2006 .

[31]  G. Schwarz Estimating the Dimension of a Model , 1978 .