Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms

Summary Streamflow forecasting is one of the most important steps in the water resources planning and management. Ensemble techniques such as bagging, boosting and stacking have gained popularity in hydrological forecasting in the recent years. The study investigates the potential usage of two ensemble learning paradigms (i.e., bagging; stochastic gradient boosting) in building classification and regression trees (CARTs) ensembles to advance the streamflow prediction accuracy. The study, initially, investigates the use of classification and regression trees for monthly streamflow forecasting and employs a support vector regression (SVR) model as the benchmark model. The analytic results indicate that CART outperforms SVR in both training and testing phases. Although the obtained results of CART model in training phase are considerable, it is not in testing phase. Thus, to optimize the prediction accuracy of CART for monthly streamflow forecasting, we incorporate bagging and stochastic gradient boosting which are rooted in same philosophy, advancing the prediction accuracy of weak learners. Comparing with the results of bagged regression trees (BRTs) and stochastic gradient boosted regression trees (GBRTs) models possess satisfactory monthly streamflow forecasting performance than CART and SVR models. Overall, it is found that ensemble learning paradigms can remarkably advance the prediction accuracy of CART models in monthly streamflow forecasting.

[1]  Jun Guo,et al.  Monthly streamflow forecasting based on improved support vector machine model , 2011, Expert Syst. Appl..

[2]  Hyun-Han Kwon,et al.  A modified support vector machine based prediction model on streamflow at the Shihmen Reservoir, Taiwan , 2010 .

[3]  Shahab Araghinejad,et al.  Application of artificial neural network ensembles in probabilistic hydrological forecasting , 2011 .

[4]  Chun-Xia Zhang,et al.  An empirical study of using Rotation Forest to improve regressors , 2008, Appl. Math. Comput..

[5]  Anton Andriyashin Financial Applications of Classification and Regression Trees , 2005 .

[6]  Puteh Saad,et al.  A hybrid least squares support vector machines and GMDH approach for river flow forecasting , 2010 .

[7]  Alex J. Cannon,et al.  Downscaling recent streamflow conditions in British Columbia, Canada using ensemble neural network models , 2002 .

[8]  Halil Ibrahim Erdal,et al.  A Comparison of Various Artificial Intelligence Methods in the Prediction of Bank Failures , 2013 .

[9]  Rafael Pino-Mejías,et al.  Reduced bootstrap aggregating of learning algorithms , 2008, Pattern Recognit. Lett..

[10]  Young-Oh Kim,et al.  Rainfall‐runoff models using artificial neural networks for ensemble streamflow prediction , 2005 .

[11]  Paolo Vezza,et al.  Low Flows Regionalization in North-Western Italy , 2010 .

[12]  Chandranath Chatterjee,et al.  Development of an accurate and reliable hourly flood forecasting model using wavelet–bootstrap–ANN (WBANN) hybrid approach , 2010 .

[13]  Chandranath Chatterjee,et al.  A new wavelet-bootstrap-ANN hybrid model for daily discharge forecasting , 2011 .

[14]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[15]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  N. Lauzon,et al.  Generalisation for neural networks through data sampling and training procedures, with applications to streamflow predictions , 2004 .

[18]  Jui-Sheng Chou,et al.  Optimizing the Prediction Accuracy of Concrete Compressive Strength Based on a Comparison of Data-Mining Techniques , 2011, J. Comput. Civ. Eng..

[19]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[20]  F. Anctil,et al.  An experiment on the evolution of an ensemble of neural networks for streamflow forecasting , 2009 .

[21]  T. Hancock,et al.  A performance comparison of modern statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies , 2005 .

[22]  J. Friedman Stochastic gradient boosting , 2002 .

[23]  S. Grunwald,et al.  Tree-based modeling of complex interactions of phosphorus loadings and environmental factors. , 2009, The Science of the total environment.

[24]  Jian Ma,et al.  A comparative assessment of ensemble learning for credit scoring , 2011, Expert Syst. Appl..

[25]  Mac McKee,et al.  Multi-time scale stream flow predictions: The support vector machines approach , 2006 .

[26]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[28]  Ozgur Kisi,et al.  A wavelet-support vector machine conjunction model for monthly streamflow forecasting , 2011 .

[29]  Chang Shu,et al.  Artificial neural network ensembles and their application in pooled flood frequency analysis , 2004 .

[30]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[31]  Ton H. Snelder,et al.  Predictive mapping of the natural flow regimes of France , 2009 .

[32]  Onisimo Mutanga,et al.  A comparison of regression tree ensembles: Predicting Sirex noctilio induced water stress in Pinus patula forests of KwaZulu-Natal, South Africa , 2010, Int. J. Appl. Earth Obs. Geoinformation.

[33]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[34]  Anton Andriyashin,et al.  Financial Applications of Classification and Regression Trees A Master Thesis Presented , 2005 .