Robust Regression Models for Predicting PM10 Concentration in an Industrial Area

Particulate Matter (PM) is an air pollutant consisting of a mixture of solid and liquid particles suspended in the air with diameter less than or equal to 10 micrometers (PM10). It can cause significant health effects, particularly among the elderly and infants, people with asthma and other respiratory diseases. The aim of this study is to determine the best robust regression models for future prediction of PM10 concentration in Pulau Pinang, Malaysia. Robust method is less sensitive than ordinary least squares (OLS) to large changes in small parts of the data. Robust regression works by assigning a weight to each data point. The weighting functions used in this study are Huber, Andrews, Bisquare, Cauchy, Fair, Logistic, Talwar, Welsch and OLS. Model comparison statistics using Prediction Accuracy (PA), Coefficient of Determination (R 2 ), Index of Agreement (IA) , Normalised Absolute Error (NAE) and Root Mean Square Error (RMSE) show that Fair is the best weighting function for next day (RMSE =11.077, NAE= 0.122, PA=0.927, IA = 0.961, R 2 =0.858, ) and next 2-day (RMSE = 14.153, NAE= 0.122, PA=0.927, IA = 0.961, R 2 =0.773) prediction while Cauchy is the best for next 3-day (RMSE = 16.012, NAE= 0.122, PA=0.927, IA = 0.961, R 2 =0.718). Performance indicators showed that the developed robust regression models can be used for long term prediction of PM10.