Development of stepwise m5 tree model to determine the influential factors on rainfall prediction and overcome the greedy problem of its algorithm

Large scale climatic phenomenon which has a delayed effect may be used as important variables for stepwise prediction of rainfall, but the interaction of these signals on the occurrence of rainfall leads to non-linear, complex nature of relations. A model tree is a promising tool for modeling complex systems and recognition of most significant variables. The model tree approach uses a greedy algorithm in which the increased number of variables does not necessarily improve the accuracy of the model, hence the model should be run stepwise. In this study, a stepwise M5 model tree was used in the prediction of annual rainfall in Hashem Abad station, north of Iran, using observed data and 17 climatic signals during the 1985-2019 period to determine the most significant variables. For this purpose, 131017 subsets consisting of 17 members were produced, and the M5 model tree was fitted on each of them. The best combination of variables with the highest accuracy simulated the rainfall with 36mm error (less them 5-6%) and a correlation coefficient of 94%. Among the climatic signals, the Sun Spot (SP) was placed in tree root (most significant), the Nino 4, EA and NAO were ranked as the other significant predictors respectively. The results also indicated that due to the nature of rainfall variations and the greedy algorithm of the M5 model, it is necessary to perform stepwise modeling. The delayed effects of teleconnections may be considered as a suitable feature for early prediction of next year's rainfall and capturing the inter-annual variation.