An efficient prediction for heavy rain from big weather data using genetic algorithm

In this paper, an approach is proposed which builds an efficient and effective model for heavy rain forecasting in 6 hours based on the past weather data. Since the weather data has a huge amount of information, in order to build efficient and effective heavy rain prediction models, we need to find proper weather attributes, isobaric surfaces and suitable range for experiments. First, we have the candidate weather attributes with expert knowledge and evaluate those attributes using machine learning approach, Support Vector Machine (SVM). After the evaluation, we find out the best resulted 3 pairs of an attribute and an isobaric surface, and combine those to have better performance in prediction. The combination of high performed weather attributes showed better performance than the combined attributes which was recommended by the experts. We next figure out how the range of area affect in prediction. After the experiments, dimensions of the best resulted data were 4,800, which will be used as the inputs to prediction models. Even though we have dramatically reduced the number of dimension compared to the original weather data, it still is not proper for heavy rain forecasting in 6 hours. The running time of the model to produce an output with 4,800 dimensions of input takes about 2 minutes. However, 2 minute is not short enough since every local place may needs to predict heavy rain with their own local heavy rain cases for more accurate weather forecasting. If there are 30 local places, it would take an hour to produce outputs for all local places. An hour is not feasible to predict heavy rain in 6 hours. Therefore, in order to build more efficient models, we apply Genetic Algorithm (GA) to find a much smaller set of inputs without degrading performance. After running GA, 4,800 inputs are reduced 757 inputs and the running time of the model with 757 inputs is 1/8 of the model with 4,800 inputs. Finally, we compare the performance between the proposed GA and the information gain (IG) based feature selection method and prove our proposed GA selected more efficient features to predict heavy rain cases.