Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

Combining randomization methods with ensem- ble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven mod- elling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of ac- curacy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally ran- domized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is com- putationally efficient; and, (iii) allows to infer the relative im- portance of the input variables, which might help in the ex- post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Aus- tralia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based meth- ods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outper- forming the other approaches in terms of computational re- quirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physi- cally meaningful interpretation.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Louis Wehenkel,et al.  Variable selection for dynamic treatment regimes: a reinforcement learning approach , 2008 .

[3]  Peter C. Young,et al.  Data-Based Mechanistic and Top-Down Modelling , 2002 .

[4]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[5]  Adele Cutler,et al.  PERT – Perfect Random Tree Ensembles , 2001 .

[6]  Marcello Restelli,et al.  Tree‐based reinforcement learning for optimal water reservoir operation , 2010 .

[7]  Kuolin Hsu,et al.  Artificial Neural Network Modeling of the Rainfall‐Runoff Process , 1995 .

[8]  L. Štravs,et al.  Development of a low-flow forecasting model using the M5 machine learning method , 2007 .

[9]  P. Young,et al.  Recent advances in the data-based modelling and analysis of hydrological systems , 1997 .

[10]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Seok Hwan Hwang,et al.  A new measure for assessing the efficiency of hydrological data-driven forecasting models , 2012 .

[12]  Avi Ostfeld,et al.  Data-driven modelling: some past experiences and new approaches , 2008 .

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  Paolo Vezza,et al.  Low Flows Regionalization in North-Western Italy , 2010 .

[15]  Keith Beven,et al.  Dalton Medal Lecture: How far can we go in distributed hydrological modelling? , 2001 .

[16]  J. S. Hunter,et al.  Statistics for Experimenters: Design, Innovation, and Discovery , 2006 .

[17]  Ximing Cai,et al.  Input variable selection for water resources systems using a modified minimum redundancy maximum relevance (mMRMR) algorithm , 2009 .

[18]  Elena Marchiori,et al.  Ensemble Feature Ranking , 2004, PKDD.

[19]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[20]  Dimitri Solomatine,et al.  Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 1: Concepts and methodology , 2009 .

[21]  Peter C. Young,et al.  Data-based mechanistic modelling and the rainfall-flow non-linearity. , 1994 .

[22]  M. B. Beck,et al.  Forecasting environmental change , 1991 .

[23]  A. Bárdossy,et al.  Development of a fuzzy logic-based rainfall-runoff model , 2001 .

[24]  Peter-Jules van Overloop,et al.  Optimal Real-Time Operation of Multipurpose Urban Reservoirs: Case Study in Singapore , 2014 .

[25]  Jose D. Salas,et al.  Estimation and validation of contemporaneous PARMA Models for streamflow simulation , 1996 .

[26]  Halil Ibrahim Erdal,et al.  Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms , 2013 .

[27]  K. Beven,et al.  Nonparametric direct mapping of rainfall‐runoff relationships: An alternative approach to data analysis and modeling? , 2004 .

[28]  Dimitri P. Solomatine,et al.  Neural networks and M5 model trees in modelling water level-discharge relationship , 2005, Neurocomputing.

[29]  Shie-Yui Liong,et al.  Use of RORB and SWMM models to an urban catchment in Singapore , 1987 .

[30]  Dimitri P. Solomatine,et al.  PRO O F CO PY [ HE / 2002 / 022579 ] 001406 Q HE M 5 Model Trees and Neural Networks : Application to Flood Forecasting in the Upper Reach of the Huai River in China , 2004 .

[31]  Peter C. Young,et al.  A data based mechanistic approach to nonlinear flood routing and adaptive flood level forecasting , 2008 .

[32]  Dimitri Solomatine,et al.  Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application , 2009 .

[33]  Chuntian Cheng,et al.  Using support vector machines for long-term discharge prediction , 2006 .

[34]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[35]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[36]  Eric Sauquet,et al.  Comparison of catchment grouping methods for flow duration curve estimation at ungauged sites in France , 2011 .

[37]  Markus Weiler,et al.  Hillslope characteristics as controls of subsurface flow variability , 2012 .

[38]  Ashu Jain,et al.  Visualisation of Hidden Neuron Behaviour in a Neural Network Rainfall-Runoff Model , 2009 .

[39]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[40]  K. Beven,et al.  Progress and directions in rainfall-runoff modelling , 1993 .

[41]  Pierre Geurts,et al.  Contributions to decision tree induction: bias/variance tradeoff and time series classification , 2002 .

[42]  Vladan Babovic,et al.  Rainfall‐Runoff Modeling Based on Genetic Programming , 2006 .

[43]  Peter C. Young,et al.  Top‐down and data‐based mechanistic modelling of rainfall–flow dynamics at the catchment scale , 2003 .

[44]  Günter Blöschl,et al.  A comparison of low flow regionalisation methods—catchment grouping , 2006 .

[45]  Louis Wehenkel,et al.  Automatic Learning Techniques in Power Systems , 1997 .

[46]  Wenge Wei,et al.  Data mining methods for hydroclimatic forecasting , 2011 .

[47]  Peter C. Young,et al.  Hypothetico‐inductive data‐based mechanistic modeling of hydrological systems , 2013 .

[48]  A. Jakeman,et al.  How much complexity is warranted in a rainfall‐runoff model? , 1993 .

[49]  Christian W. Dawson,et al.  Inductive Learning Approaches to Rainfall-Runoff Modelling , 2000, Int. J. Neural Syst..

[50]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[51]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[52]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[53]  D. Solomatine,et al.  Model trees as an alternative to neural networks in rainfall—runoff modelling , 2003 .

[54]  Peter C. Young,et al.  Rainfall‐Runoff Modeling: Transfer Function Models , 2006 .

[55]  V. Jothiprakash,et al.  Effect of Pruning and Smoothing while Using M5 Model Tree Technique for Reservoir Inflow Prediction , 2011 .

[56]  Asaad Y. Shamseldin,et al.  Comparison of different forms of the Multi-layer Feed-Forward Neural Network method used for river flow forecasting , 2002 .

[57]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[58]  Ton H. Snelder,et al.  Predictive mapping of the natural flow regimes of France , 2009 .

[59]  Holger R. Maier,et al.  Neural networks for the prediction and forecasting of water resource variables: a review of modelling issues and applications , 2000, Environ. Model. Softw..

[60]  Vladan Babovic,et al.  Rainfall runoff modelling based on genetic programming , 2002 .