Genetic programming for hydrological applications: to model or to forecast that is the question

Genetic programming (GP) is a widely used machine learning (ML) algorithm that has been applied in water resources science and engineering since its conception in the early 1990s. However, similar to other ML applications, the GP algorithm is often used as a data fitting tool rather than as a model building instrument. We find this a gross underutilization of the GP capabilities. The most unique and distinct feature of GP that makes it distinctly different from the rest of ML techniques is its capability to produce explicit mathematical relationships between input and output variables. In the context of theory-guided data science (TGDS) which recently emerged as a new paradigm in ML with the main goal of blending the existing body of knowledge with ML techniques to induce physically sound models. Hence, TGDS has evolved into a popular data science paradigm, especially in scientific disciplines including water resources. Following these ideas, in our prior work, we developed two hydrologically informed rainfall-runoff model induction toolkits for lumped modelling and distributed modelling based on GP. In the current work, the two toolkits are applied using a different hydrological model building library. Here, the model building blocks are derived from the Sugawara TANK model template which represents the elements of hydrological knowledge. Results are compared against the traditional GP approach and suggest that GP as a rainfall-runoff model induction toolkit preserves the prediction power of the traditional GP short-term forecasting approach while benefiting to better understand the catchment runoff dynamics through the readily interpretable induced models.

[1]  Zaher Mundher Yaseen,et al.  Artificial intelligence based models for stream-flow forecasting: 2000-2015 , 2015 .

[2]  N. K. Goel,et al.  Improving real time flood forecasting using fuzzy inference system , 2014 .

[3]  Dragan Savic,et al.  Evolutionary Computing in Hydrological Sciences , 2006 .

[4]  W. Nie,et al.  A modified tank model including snowmelt and infiltration time lags for deep-seated landslides in alpine environments (Aggenalm, Germany) , 2016 .

[5]  Dmitri Kavetski,et al.  Elements of a flexible approach for conceptual hydrological modeling: 1. Motivation and theoretical development , 2011 .

[6]  Omid Bozorg Haddad,et al.  Application of Genetic Programming in Hydrology , 2015, Handbook of Genetic Programming Applications.

[7]  A. Cannon,et al.  Improving gridded snow water equivalent products in British Columbia , Canada : multi-source data fusion by neural network models , 2017 .

[8]  P. Krause,et al.  COMPARISON OF DIFFERENT EFFICIENCY CRITERIA FOR HYDROLOGICAL MODEL ASSESSMENT , 2005 .

[9]  M. Sugawara,et al.  Automatic calibration of the tank model / L'étalonnage automatique d'un modèle à cisterne , 1979 .

[10]  Zaher Mundher Yaseen,et al.  Genetic programming in water resources engineering: A state-of-the-art review , 2018, Journal of Hydrology.

[11]  Martyn P. Clark,et al.  Development of a large-sample watershed-scale hydrometeorological data set for the contiguous USA: data set characteristics and assessment of regional variability in hydrologic model performance , 2014 .

[12]  Hoshin Vijai Gupta,et al.  Do Nash values have value? , 2007 .

[13]  David G. Tarboton,et al.  An overview of current applications, challenges, and future trends in distributed process-based models in hydrology , 2016 .

[14]  Vladan Babovic,et al.  Data Mining and Knowledge Discovery in Sediment Transport , 2000 .

[15]  Andrei Băutu,et al.  Meteorological Data Analysis and Prediction by Means of Genetic Programming , 2008 .

[16]  Vladan Babovic,et al.  Rainfall‐Runoff Modeling Based on Genetic Programming , 2006 .

[17]  Kuolin Hsu,et al.  HESS Opinions: Incubating deep-learning-powered hydrologic science advances as a community , 2018, Hydrology and Earth System Sciences.

[18]  M. Mast,et al.  Environmental Characteristics and Water Quality of Hydrologic Benchmark Network Stations in the Midwestern United States, 1963-95 , 1999 .

[19]  Nitin Muttil,et al.  Testing the Structure of Hydrological Models using Genetic Programming , 2011 .

[20]  Nans Addor,et al.  Legacy, Rather Than Adequacy, Drives the Selection of Hydrological Models , 2019, Water Resources Research.

[21]  Frederik Kratzert,et al.  What Role Does Hydrological Science Play in the Age of Machine Learning , 2020 .

[22]  Keith Beven,et al.  The future of distributed models: model calibration and uncertainty prediction. , 1992 .

[23]  Martin Hanel,et al.  Incorporating basic hydrological concepts into genetic programming for rainfall-runoff forecasting , 2013, Computing.

[24]  Y. Her,et al.  Simulink Implementation of a Hydrologic Model: A Tank Model Case Study , 2017 .

[25]  Amy McGovern,et al.  Making the Black Box More Transparent: Understanding the Physical Implications of Machine Learning , 2019, Bulletin of the American Meteorological Society.

[26]  T. M. Chui,et al.  An empirical method for approximating stream baseflow time series using groundwater table fluctuations , 2014 .

[27]  Ibrahim El-Baroudy,et al.  Investigating the capabilities of evolutionary data-driven techniques using the challenging estimation of soil moisture content , 2009 .

[28]  Vladan Babovic,et al.  Rainfall runoff modelling based on genetic programming , 2002 .

[29]  Vladan Babovic,et al.  Hydrologically Informed Machine Learning for Rainfall‐Runoff Modeling: A Genetic Programming‐Based Toolkit for Automatic Model Induction , 2020, Water Resources Research.

[30]  M. Errih,et al.  Uncertainty analysis of HEC-HMS model using the GLUE method for flash flood forecasting of Mekerra watershed, Algeria , 2016, Arabian Journal of Geosciences.

[31]  S. Hochreiter,et al.  Toward Improved Predictions in Ungauged Basins: Exploiting the Power of Machine Learning , 2019, Water Resources Research.

[32]  V. Babovic,et al.  Hydrologically Informed Machine Learning for Rainfall-Runoff Modelling: Towards Distributed Modelling , 2020, Hydrology and Earth System Sciences.

[33]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[34]  Robert E. Criss,et al.  Do Nash values have value? Discussion and alternate proposals , 2008 .

[35]  Martyn P. Clark,et al.  Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models , 2008 .

[36]  Sean W. Fleming,et al.  Artificial neural network forecasting of nonlinear Markov processes , 2007 .

[37]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[38]  Dmitri Kavetski,et al.  From spatially variable streamflow to distributed hydrological models: Analysis of key modeling decisions , 2016 .

[39]  M. Keijzer,et al.  Genetic programming as a model induction engine , 2000 .

[40]  Keith Beven,et al.  Deep learning, hydrological processes and the uniqueness of place , 2020, Hydrological Processes.

[41]  Hoshin Vijai Gupta,et al.  Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling , 2009 .

[42]  J. Nash,et al.  River flow forecasting through conceptual models part I — A discussion of principles☆ , 1970 .

[43]  Mehdi Vafakhah,et al.  A Wavelet-ANFIS Hybrid Model for Groundwater Level Forecasting for Different Prediction Periods , 2013, Water Resources Management.

[44]  William W. Hsieh Machine Learning Methods in the Environmental Sciences: Contents , 2009 .

[45]  Vladan Babovic,et al.  Introducing knowledge into learning based on genetic programming. , 2009 .

[46]  R. S. Govindaraju,et al.  Artificial Neural Networks in Hydrology , 2010 .

[47]  null null,et al.  Artificial Neural Networks in Hydrology. II: Hydrologic Applications , 2000 .

[48]  S. Liong,et al.  EC-SVM approach for real-time hydrologic forecasting , 2004 .

[49]  Elahe Fallah-Mehdipour,et al.  Prediction and simulation of monthly groundwater levels by genetic programming , 2013 .

[50]  Vladan Babovic,et al.  Declarative and Preferential Bias in GP-based Scientific Discovery , 2002, Genetic Programming and Evolvable Machines.

[51]  Conor Ryan,et al.  Adaptive logic programming , 2001 .

[52]  R. Middleton,et al.  Estimating hydrologic vulnerabilities to climate change using simulated historical data: A proof-of-concept for a rapid assessment algorithm in the Colorado River Basin , 2019 .

[53]  Alex J. Cannon,et al.  A graphical sensitivity analysis for statistical climate models: application to Indian monsoon rainfall prediction by artificial neural networks and multiple linear regression models , 2002 .

[54]  Paul Voosen,et al.  The AI detectives. , 2017, Science.

[55]  Dmitri Kavetski,et al.  Elements of a flexible approach for conceptual hydrological modeling: 2. Application and experimental insights , 2011 .

[56]  Ozgur Kisi,et al.  Short-term and long-term streamflow prediction by using 'wavelet–gene expression' programming approach , 2016 .

[57]  Fabrizio Fenicia,et al.  Comparing classical performance measures with signature indices derived from flow duration curves to assess model structures as tools for catchment classification , 2016 .

[58]  Andrea Castelletti,et al.  Curses, Tradeoffs, and Scalable Management: Advancing Evolutionary Multiobjective Direct Policy Search to Improve Water Reservoir Operations , 2016 .

[59]  Anthony J. Jakeman,et al.  Data Mining in Hydrology , 2003 .