Monthly Rainfall-Runoff Modeling at Watershed Scale: A Comparative Study of Data-Driven and Theory-Driven Approaches

Data-driven machine learning approaches have been rapidly developed in the past 10 to 20 years and applied to various problems in the field of hydrology. To investigate the capability of data-driven approaches in rainfall-runoff modeling in comparison to theory-driven models, we conducted a comparative study of simulated monthly surface runoff at 203 watersheds across the contiguous USA using a conceptual model, the proportionality hydrologic model, and a data-driven Gaussian process regression model. With the same input variables of precipitation and mean monthly aridity index, the two models showed similar performance. We then introduced two more input variables in the data-driven model: potential evaporation and the normalized difference vegetation index (NDVI), which were selected based on hydrologic knowledge. The modified data-driven model performed much better than either the conceptual or original data-driven model. A sensitivity analysis was conducted on all three models tested in this study, which showed that surface runoff responded positively to increased precipitation. However, a confounding effect on surface runoff sensitivity was found among mean monthly aridity index, potential evaporation, and NDVI. This confounding was caused by complex interconnections among energy supply, vegetation coverage, and climate seasonality of the watershed system. We also conducted an uncertainty analysis on the two data-driven models, which showed that both models had reasonable predictability within the 95% confidence interval. With the additional two input variables, the modified data-driven model had lower prediction uncertainty and higher prediction accuracy.

[1]  Martin F. Lambert,et al.  Calibration and validation of neural networks to ensure physically plausible hydrological modeling , 2005 .

[2]  Avi Ostfeld,et al.  Data-driven modelling: some past experiences and new approaches , 2008 .

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  D. Pollard,et al.  Calibrating an Ice Sheet Model Using High-Dimensional Binary Spatial Data , 2015, 1501.01937.

[5]  P. Coulibaly,et al.  Two decades of anarchy? Emerging themes and outstanding challenges for neural network river forecasting , 2012 .

[6]  Pablo Montero,et al.  TSclust: An R Package for Time Series Clustering , 2014 .

[7]  Jianxun He,et al.  Prediction of event-based stormwater runoff quantity and quality by ANNs developed using PMI-based input selection , 2011 .

[8]  S. Running,et al.  A continuous satellite‐derived global record of land surface evapotranspiration from 1983 to 2006 , 2010 .

[9]  Chris E. Forest,et al.  Statistical calibration of climate system properties , 2009 .

[10]  Seth D. Guikema,et al.  Machine learning methods for empirical streamflow simulation: a comparison of model accuracy, interpretability, and uncertainty in seasonal watersheds , 2016 .

[11]  François Anctil,et al.  Impact of the length of observed records on the performance of ANN and of conceptual parsimonious rainfall-runoff forecasting models , 2004, Environ. Model. Softw..

[12]  Dimitri Solomatine,et al.  Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 1: Concepts and methodology , 2009 .

[13]  Murali Haran,et al.  Improving Ice Sheet Model Calibration Using Paleoclimate and Modern Data , 2015, 1510.01676.

[14]  Edwin W. Pak,et al.  An extended AVHRR 8‐km NDVI dataset compatible with MODIS and SPOT vegetation NDVI data , 2005 .

[15]  Lynn Wu,et al.  Similarity of climate control on base flow and perennial stream density in the Budyko framework , 2013 .

[16]  Robert J. Abrahart,et al.  Legitimising data-driven models: exemplification of a new data-driven mechanistic modelling framework , 2013 .

[17]  Alexander Y. Sun,et al.  Monthly streamflow forecasting using Gaussian Process Regression , 2014 .

[18]  M. Haran,et al.  What is the effect of unresolved internal climate variability on climate sensitivity estimates? , 2013 .

[19]  Holger R. Maier,et al.  Data-driven modelling approaches for socio-hydrology: opportunities and challenges within the Panta Rhei Science Plan , 2016 .

[20]  Dingbao Wang,et al.  Modeling seasonal surface runoff and base flow based on the generalized proportionality hypothesis , 2015 .

[21]  Christopher K. I. Williams Computation with Infinite Neural Networks , 1998, Neural Computation.

[22]  Holger R. Maier,et al.  Non-linear variable selection for artificial neural networks using partial mutual information , 2008, Environ. Model. Softw..

[23]  Vahid Nourani,et al.  Sensitivity analysis of the artificial neural network outputs in simulation of the evaporation process at different climatologic regimes , 2012, Adv. Eng. Softw..

[24]  Armando Brath,et al.  Multistep ahead streamflow forecasting: Role of calibration data in conceptual and neural network modeling , 2007 .

[25]  M. Roderick,et al.  A simple framework for relating variations in runoff to variations in climatic conditions and catchment properties , 2011 .

[26]  Holger R. Maier,et al.  Input determination for neural network models in water resources applications. Part 1—background and methodology , 2005 .

[27]  Robert J. Abrahart,et al.  Load or concentration, logged or unlogged? Addressing ten years of uncertainty in neural network suspended sediment prediction , 2011 .

[28]  T. McMahon,et al.  Evaluation of automated techniques for base flow and recession analyses , 1990 .

[29]  D. Nychka Spatial‐Process Estimates as Smoothers , 2012 .

[30]  Soroosh Sorooshian,et al.  Model Parameter Estimation Experiment (MOPEX): An overview of science strategy and major results from the second and third workshops , 2006 .

[31]  P. Boesiger,et al.  A new correlation‐based fuzzy logic clustering algorithm for FMRI , 1998, Magnetic resonance in medicine.

[32]  M. Sivapalan,et al.  From channelization to restoration: Sociohydrologic modeling with changing community preferences in the Kissimmee River Basin, Florida , 2016 .

[33]  Holger R. Maier,et al.  Neural networks for the prediction and forecasting of water resource variables: a review of modelling issues and applications , 2000, Environ. Model. Softw..

[34]  Murali Haran,et al.  Probabilistic calibration of a Greenland Ice Sheet model using spatially resolved synthetic observations: toward projections of ice mass loss with uncertainties , 2014 .

[35]  Vladan Babovic,et al.  Data mining in hydrology , 2005 .

[36]  Dingbao Wang,et al.  A one‐parameter Budyko model for water balance captures emergent behavior in darwinian hydrologic models , 2014 .

[37]  M. Haran,et al.  Fast dimension-reduced climate model calibration and the effect of data aggregation , 2013, 1303.1382.

[38]  Murali Haran,et al.  Large ensemble modeling of the last deglacial retreat of the West Antarctic Ice Sheet: comparison of simple and advanced statistical techniques , 2015 .

[39]  James R. Gattiker,et al.  The potential of an observational data set for calibration of a computationally expensive computer model , 2013 .

[40]  Dingbao Wang,et al.  Responses of annual runoff, evaporation, and storage change to climate variability at the watershed scale , 2012 .

[41]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[42]  Dawen Yang,et al.  Assessing the impact of climate variability on catchment water balance and vegetation cover , 2011 .

[43]  J. Nash,et al.  River flow forecasting through conceptual models part I — A discussion of principles☆ , 1970 .

[44]  A. Castelletti,et al.  Tree‐based iterative input variable selection for hydrological modeling , 2013 .

[45]  K. P. Sudheer,et al.  Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions , 2010, Environ. Model. Softw..

[46]  Dingbao Wang,et al.  Modeling interannual variability of seasonal evaporation and storage change based on the extended Budyko framework , 2013 .

[47]  Hoshin Vijai Gupta,et al.  Toward improved identification of hydrological models: A diagnostic evaluation of the “abcd” monthly water balance model for the conterminous United States , 2010 .

[48]  K. P. Sudheer,et al.  Knowledge Extraction from Trained Neural Network River Flow Models , 2005 .

[49]  Robert J. Abrahart,et al.  Sensitivity analysis for comparison, validation and physical legitimacy of neural network-based hydrological models , 2014 .

[50]  Dimitri Solomatine,et al.  Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application , 2009 .

[51]  L. Chua,et al.  Influence of lag time on event-based rainfall–runoff modeling using the data driven approach , 2012 .