Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application

Abstract. In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K-nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should not be ignored as a potential modeling technique for hydrological applications.

[1]  P.M.M. Warmerdam,et al.  Effect of climate change on the hydrology of the river Meuse , 2001 .

[2]  Amin Elshorbagy,et al.  Simulation of the hydrological processes on reconstructed watersheds using system dynamics , 2007 .

[3]  Amin Elshorbagy,et al.  Toward improving the reliability of hydrologic prediction: Model structure uncertainty and its quantification using ensemble‐based genetic programming framework , 2008 .

[4]  Amin Elshorbagy,et al.  On the relevance of using artificial neural networks for estimating soil moisture content , 2008 .

[5]  Richard L. Snyder,et al.  A review of models and micrometeorological methods used to estimate wetland evapotranspiration , 2004 .

[6]  Amin Elshorbagy,et al.  Modelling the dynamics of the evapotranspiration process using genetic programming , 2007 .

[7]  Peter Reutemann,et al.  WEKA Manual for Version 3-6-10 , 2008 .

[8]  Dimitri Solomatine,et al.  Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 1: Concepts and methodology , 2009 .

[9]  Dawei Han,et al.  Model data selection using gamma test for daily solar radiation estimation , 2008 .

[10]  H.E.J. Berger,et al.  Flow Forecasting for the River Meuse , 1992 .

[11]  Calvin Dwight Boese,et al.  The design and installation of a field instrumentation program for the evaluation of soil-atmosphere water fluxes in a vegetated cover over saline/sodic shale overburden , 2003 .

[12]  Antonia J. Jones,et al.  Feature selection for genetic sequence classification , 1998, Bioinform..

[13]  A. J. Jones,et al.  A proof of the Gamma test , 2002, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[14]  Ibrahim El-Baroudy,et al.  Investigating the capabilities of evolutionary data-driven techniques using the challenging estimation of soil moisture content , 2009 .