Support vector regression with missing data treatment based variables selection for water level prediction of Galas River in Kelantan Malaysia

Rising in water level becomes an important issue in the state of Galas River in Kuala KraiKelantan Malaysia since it is one of important indicator toward to flooding when it achieves a certain level. The increasing of water level is influenced by some factors which called the predictor variables such as month, rainfall, temperature, relative humidity and surface wind. The data for this analysis including the predictors and water level as response were collected from Water Resources Management and Hydrology Division Department of Irrigation and Drainage Malaysia and Malaysian Meteorological Department. However, we noticed there are missing values in the collected data. The selection of suitable predictor variables useful for developing prediction model since the analysis data uses many variables. The suitable predictor variables are selected using Support Vector Regression (SVR) and Cross Validation to obtain an appropriate predictive water level of Galas River Kuala Krai. We take into account the K-fold cross-validation for determining of the dominant variables and best model. However, we need to perform pre-processing data of the datasets since the original data contain missing values. We perform two types of pre-processing data using mean (type I pre-processing data) and Ordinary Linear Regression (type II pre-processing data) to overcome the existing of the missing values. Our experimental result shows that the Gaussian kernel is the suitable kernel function with type I pre-processing data for the predicting water level in Galas River. Key-Words: Galas River, Kelantan, support vector regression, missing value, water level, nonlinear regression, cross validation, variables selection.