Using gene expression programming to develop a combined runoff estimate model from conventional rainfall-runoff model outputs

In previous studies, artificial neural networks have been used to develop a model that combines simulated river flows from several individual rainfall-runoff models e.g. Shamseldin et al., 1997; Abrahart & See, 2002; Shamseldin, O'Connor, & Nasr, 2007. The combined runoff estimate model was found to perform better than the individual models in most of the cases. However, no attempts have been made to explain the inner workings of the combined models or the drivers for their success. The research presented in this study investigates the use of gene expression programming (GEP) to develop a combination rainfall-runoff model through the process of symbolic regression. One of the additional advantages of this approach over the neural combination method is the model’s ability to represent itself in the form of mathematical expressions. The GEP model is developed using the daily simulated river flows of four other rainfall runoff models for the Chu catchment which is located in Vietnam. The four models are the linear perturbation model (LPM), the linearly varying gain factor model (LVGFM), the probability-distributed interacting storage capacity (PDISC) model, and the soil moisture accounting and routing (SMAR) models. In this paper, GeneXproTools 4.0, a powerful soft computing software package, is used to develop the combined model. The program provides transparent modeling solutions in the sense that it provides the users with the mathematical equation describing the combined model. The results reveal that combination using symbolic regression is successful and that a superior combined model can be developed using outputs from other individual models. The structure of the combined model is also investigated in this study. The results show that the combined model is dominated by input information from the PDISC model forming the baseline estimate, to which different permutations and combinations of the remaining inputs from the other models are added. This research, limited to one river catchment, paves the way for further investigations into GEP model development for different types of catchment. Over-fitting of the training set data during the model development observed in this study highlights the need to investigate appropriate stopping criteria.