Bedload transport rate prediction: Application of novel hybrid data mining techniques

Abstract The accurate prediction of bedload transport in gravel-bed rivers remains a significant challenge in river science. However the potential for data mining algorithms to provide models of bedload transport have yet to be explored. This study provides the first quantification of the predictive power of a range of standalone and hybrid data mining models. Using bedload transport data collected in laboratory flume experiments, the performance of four types of recently developed standalone data mining techniques - the M5P, random tree (RT), random forest (RF) and the reduced error pruning tree (REPT) - are assessed, along with four types of hybrid algorithms trained with a Bagging (BA) data mining algorithm (BA-M5P, BA-RF, BA-RT and BA-REPT). The main findings are four-fold. First, the BA-M5P model had the highest prediction power (R2 = 0.943; RMSE = 0.061 kg m−1 s−1; MAE = 0.040 kg m−1 s−1; NSE = 0.945; PBIAS = −1.60) followed by M5P, BA-RT, RT, BA-RF, RF, BA-REPT, and REPT. All models displayed ‘very good’ performance except the BA-REPT and REPT model, which were ‘satisfactory’. Second, the M5P, BA-RT, and RT models underestimated, and the BA-M5P, BA-RF, RF, BA-REPT and REPT models overestimated, bedload transport rates. Third, flow velocity had the most significant impact on bedload transport rate (PCC = 0.760) followed by shear stress (PCC = 0.709), discharge (PCC = 0.668), bed shear velocity (PCC = 0.663), bed slope (PCC = 0.490), flow depth (PCC = 0.303), median sediment diameter (PCC = 0.247), and relative roughness (PCC = 0.003). Fourth, the maximum depth of tree was the most sensitive operator in decision tree-based algorithms, and batch size, number of execution slots and number of decimal places did not have any impact on model’ prediction power. Overall the results revealed that hybrid data mining techniques provide more accurate predictions of bedload transport rate than standalone data mining models. In particular, M5P models, trained with a Bagging data mining algorithm, have great potential to produce robust predictions of bedload transport in gravel-bed rivers.

[1]  Dieu Tien Bui,et al.  A comparison of Support Vector Machines and Bayesian algorithms for landslide susceptibility modelling , 2018, Geocarto International.

[2]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[3]  H. Einstein,et al.  The Bed-Load Function for Sediment Transportation in Open Channel Flows , 1950 .

[4]  Kim Falinski,et al.  Sediment delivery modeling in practice: Comparing the effects of watershed characteristics and data resolution across hydroclimatic regions. , 2017, The Science of the total environment.

[5]  Yan Li,et al.  Soil moisture forecasting by a hybrid machine learning technique: ELM integrated with ensemble empirical mode decomposition , 2018, Geoderma.

[6]  Ahmad Sharafati,et al.  The potential of novel data mining models for global solar radiation prediction , 2019, International Journal of Environmental Science and Technology.

[7]  Shahaboddin Shamshirband,et al.  Extreme learning machine assessment for estimating sediment transport in open channels , 2016, Engineering with Computers.

[8]  D. Bui,et al.  Spatial prediction of groundwater spring potential mapping based on an adaptive neuro-fuzzy inference system and metaheuristic optimization , 2018, Hydrology and Earth System Sciences.

[9]  C. C. Heyde,et al.  On the number of terminal vertices in certain random trees with an application to stemma construction in philology , 1982, Journal of Applied Probability.

[10]  Jan Adamowski,et al.  Comparison of machine learning models for predicting fluoride contamination in groundwater , 2017, Stochastic Environmental Research and Risk Assessment.

[11]  Bernhard Gittenberger,et al.  On the profile of random trees , 1997 .

[12]  Qiuwen Chen,et al.  Long-term precipitation forecast for drought relief using atmospheric circulation factors: a study on the Maharloo Basin in Iran , 2013 .

[13]  B. Pham,et al.  A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. , 2018, The Science of the total environment.

[14]  Özgür Kisi,et al.  Daily pan evaporation modeling using chi-squared automatic interaction detector, neural networks, classification and regression tree , 2016, Comput. Electron. Agric..

[15]  Howard H. Chang River morphology and river channel changes , 2008 .

[16]  Frank T.-C. Tsai,et al.  A comparison study of DRASTIC methods with various objective methods for groundwater vulnerability assessment. , 2018, The Science of the total environment.

[17]  V. Tsihrintzis,et al.  Hydrologic and Water Quality Modeling of Lower Nestos River Basin , 2012, Water Resources Management.

[18]  Chyon-Hwa Yeh,et al.  Classification and regression trees (CART) , 1991 .

[19]  Dongwei Gui,et al.  Assessing the potential of random forest method for estimating solar radiation using air pollution index , 2016 .

[20]  Sajjad Ahmad,et al.  Suspended sediment load prediction of river systems: An artificial neural network approach , 2011 .

[21]  Luca Mao,et al.  Grain size and topographical differences between static and mobile armour layers , 2011 .

[22]  B. Schröder,et al.  Estimation of suspended sediment concentration and yield using linear models, random forests and quantile regression forests , 2008 .

[23]  Ozgur Kisi,et al.  Estimation of Daily Suspended Sediment Load by Using Wavelet Conjunction Models , 2012 .

[24]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[25]  V. Singh,et al.  New Hybrids of ANFIS with Several Optimization Algorithms for Flood Susceptibility Modeling , 2018, Water.

[26]  Bofu Yu,et al.  Streamflow and Sediment Yield Prediction for Watershed Prioritization in the Upper Blue Nile River Basin, Ethiopia , 2017 .

[27]  P. Wilcock,et al.  Surface-based Transport Model for Mixed-Size Sediment , 2003 .

[28]  V. Singh,et al.  Novel Hybrid Evolutionary Algorithms for Spatial Prediction of Floods , 2018, Scientific Reports.

[29]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[30]  Ricardo Fraiman,et al.  Testing statistical hypothesis on random trees and applications to the protein classification problem , 2006, math/0603378.

[31]  O. Kisi,et al.  Suspended sediment modeling using genetic programming and soft computing techniques , 2012 .

[32]  Aleksei Shkurin Water Quality Analysis Using Machine Learning Algorithms , 2016 .

[33]  Khabat Khosravi,et al.  Application and Comparison of Decision Tree-Based Machine Learning Methods in Landside Susceptibility Assessment at Pauri Garhwal Area, Uttarakhand, India , 2017, Environmental Processes.

[34]  Jeffrey G. Arnold,et al.  Model Evaluation Guidelines for Systematic Quantification of Accuracy in Watershed Simulations , 2007 .

[35]  S. Sulaiman,et al.  Application of HEC-RAS Model to Predict Sediment Transport for Euphrates River from Haditha to Heet 2016 , 2017 .

[36]  Biswajeet Pradhan,et al.  Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using GIS , 2016 .

[37]  K. Taylor Summarizing multiple aspects of model performance in a single diagram , 2001 .

[38]  Zaher Mundher Yaseen,et al.  Quantifying hourly suspended sediment load using data mining models: Case study of a glacierized Andean catchment in Chile , 2018, Journal of Hydrology.

[39]  S. Tait,et al.  Examining the physical components of boundary shear stress for water‐worked gravel deposits , 2010 .

[40]  D. Legates,et al.  Evaluating the use of “goodness‐of‐fit” Measures in hydrologic and hydroclimatic model validation , 1999 .

[41]  S. Sasikala,et al.  REPTREE CLASSIFIER FOR IDENTIFYING LINK SPAM IN WEB SEARCH ENGINES , 2013, SOCO 2013.

[42]  Hossein Bonakdari,et al.  Performance Evaluation of Adaptive Neural Fuzzy Inference System for Sediment Transport in Sewers , 2014, Water Resources Management.

[43]  Onisimo Mutanga,et al.  High density biomass estimation for wetland vegetation using WorldView-2 imagery and random forest regression algorithm , 2012, Int. J. Appl. Earth Obs. Geoinformation.

[44]  Seyed Vahid Razavi Termeh,et al.  Optimización de un sistema de inferencia neuro-fuzzy adaptable para el mapeo del potencial de aguas subterráneas , 2019 .

[45]  P. Wilcock,et al.  Two-fraction model of initial sediment motion in gravel-Bed rivers , 1998, Science.

[46]  Albert Gan,et al.  Prediction of Lane Clearance Time of Freeway Incidents Using the M5P Tree Algorithm , 2011, IEEE Transactions on Intelligent Transportation Systems.

[47]  P. Atkinson,et al.  Modelling interannual variation in the spring and autumn land surface phenology of the European forest , 2016 .

[48]  M. Noori,et al.  Floodplain Zoning Simulation by Using HEC-RAS and CCHE2D Models in the Sungai Maka River , 2016 .

[49]  A. Binns,et al.  Uniform and graded bed-load sediment transport in a degrading channel with non-equilibrium conditions , 2020, International Journal of Sediment Research.

[50]  Wei Chen,et al.  Spatial prediction of groundwater potentiality using ANFIS ensembled with teaching-learning-based and biogeography-based optimization , 2019, Journal of Hydrology.

[51]  H. Pourghasemi,et al.  Flash flood susceptibility analysis and its mapping using different bivariate models in Iran: a comparison between Shannon’s entropy, statistical index, and weighting factor models , 2016, Environmental Monitoring and Assessment.

[52]  Babak Mohammadi,et al.  Pan evaporation prediction using a hybrid multilayer perceptron-firefly algorithm (MLP-FFA) model: case study in North Iran , 2018, Theoretical and Applied Climatology.

[53]  Yan Li,et al.  Ensemble committee-based data intelligent approach for generating soil moisture forecasts with multivariate hydro-meteorological predictors , 2018, Soil and Tillage Research.

[54]  Andy Liaw,et al.  New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis. , 2004 .

[55]  Frank T.-C. Tsai,et al.  Optimization of an adaptive neuro-fuzzy inference system for groundwater potential mapping , 2019, Hydrogeology Journal.

[56]  R. Abrahart,et al.  Flood estimation at ungauged sites using artificial neural networks , 2006 .

[57]  B. Pham,et al.  Bagging based Support Vector Machines for spatial prediction of landslides , 2018, Environmental Earth Sciences.

[58]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[59]  Mohammad Rostami,et al.  Evaluating a Numerical Model to Simulate the Variation of River Bed Due to a Mining Pit Based on Experimental Data , 2012 .

[60]  W. Graf Hydraulics of Sediment Transport , 1984 .

[61]  Hossein Bonakdari,et al.  Evaluation of Sediment Transport in Sewer using Artificial Neural Network , 2013 .

[62]  Zaher Mundher Yaseen,et al.  Novel approach for streamflow forecasting using a hybrid ANFIS-FFA model , 2017 .

[63]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[64]  F. Engelund,et al.  A monograph on sediment transport in alluvial streams , 1967 .

[65]  Aydin Akan,et al.  Evaluation of bagging ensemble method with time-domain feature extraction for diagnosing of arrhythmia beats , 2012, Neural Computing and Applications.

[66]  Shahaboddin Shamshirband,et al.  A combined support vector machine-wavelet transform model for prediction of sediment transport in sewer , 2016 .

[67]  Ozgur Kisi,et al.  Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and M5 model tree , 2018 .

[68]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[69]  Zaher Mundher Yaseen,et al.  Determination of compound channel apparent shear stress: application of novel data mining models , 2019, Journal of Hydroinformatics.

[70]  Isa Ebtehaj,et al.  Optimizing ANFIS for sediment transport in open channels using different evolutionary algorithms , 2017 .

[71]  J. Adamowski,et al.  Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran , 2016, Stochastic Environmental Research and Risk Assessment.

[72]  Luca Mao,et al.  The effect of hydrographs on bed load transport and bed sediment spatial arrangement , 2012 .

[73]  V. Singh,et al.  Drought forecasting in eastern Australia using multivariate adaptive regression spline, least square support vector machine and M5Tree model , 2017 .

[74]  R. Müller,et al.  Formulas for Bed-Load transport , 1948 .

[75]  J. Harou,et al.  Simulating Water Allocation and Cropping Decisions in Yemen’s Abyan Delta Spate Irrigation System , 2018 .

[76]  Zaher Mundher Yaseen,et al.  The implementation of univariable scheme-based air temperature for solar radiation prediction: New development of dynamic evolving neural-fuzzy inference system model , 2019, Applied Energy.

[77]  W. N. H. W. Mohamed,et al.  A comparative study of Reduced Error Pruning method in decision tree algorithms , 2012, 2012 IEEE International Conference on Control System, Computing and Engineering.