Evaluation of various boosting ensemble algorithms for predicting flood hazard susceptibility areas

Abstract The purpose of the present study was to predict the areas affected by flood hazard in the Talar watershed, Mazandaran province, Iran, using Adaptive Boosting (AdaBoost), Boosted Generalized Linear Models (BGLM), Extreme Gradient Boosting (XGB) ensemble models, and the novel ensemble framework of deep decision trees include the Deep Boosting (DB) model. For this purpose, 14 flood conditioning variables were used as independent variables in flood hazard modeling. In addition, 130 flood points in the region were identified by field visits and available flood information, which were used as the dependent variable in modeling. The results showed that all used models have a good efficiency in predicting flood hazard. The area under curve (AUC) of BGLM, XGB, AdaBoost and DB models were 0.88, 0.87, 0.89 and 0.91, respectively, which indicated the highest efficiency of the DB model in flood hazard modeling in the study area. Relative importance of the variables showed that they have different effects in each model. Altitude and distance from the river are more important than other variables. However, these two variables have been selected as the most important variables based on machine learning models, but other variables may be influential in flood hazards.

[1]  Roger Few,et al.  Flooding, vulnerability and coping strategies: local responses to a global threat , 2003 .

[2]  Nadhir Al-Ansari,et al.  Flood Detection and Susceptibility Mapping Using Sentinel-1 Remote Sensing Data and a Machine Learning Approach: Hybrid Intelligence of Bagging Ensemble Based on K-Nearest Neighbor Classifier , 2020, Remote. Sens..

[3]  Hyung-Sup Jung,et al.  Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea , 2017 .

[4]  J. Elith,et al.  Determinants of reproductive success in dominant pairs of clownfish: a boosted regression tree analysis. , 2011, The Journal of animal ecology.

[5]  Zongxue Xu,et al.  Mapping flood susceptibility in mountainous areas on a national scale in China. , 2018, The Science of the total environment.

[6]  S. Déry,et al.  Flooding in the Nechako River Basin of Canada: A random forest modeling approach to flood analysis in a regulated reservoir system , 2016 .

[7]  Muhammad Zahid,et al.  Predicting Risky and Aggressive Driving Behavior among Taxi Drivers: Do Spatio-Temporal Attributes Matter? , 2020, International journal of environmental research and public health.

[8]  R. Attarnejad,et al.  Improved Desalination Pipeline System Utilizing the Temperature Difference under Sub-Atmospheric Pressure , 2020 .

[9]  Mahyat Shafapour Tehrany,et al.  Flood susceptibility assessment using GIS-based support vector machine model with different kernel types , 2015 .

[10]  Xiaohong Chen,et al.  Flood hazard risk assessment model based on random forest , 2015 .

[11]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[12]  Y. Haitovsky Multicollinearity in Regression Analysis: Comment , 1969 .

[13]  Abdul Halim Ghazali,et al.  Ensemble machine-learning-based geospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS , 2017 .

[14]  Terrence Fong,et al.  Automatic boosted flood mapping from satellite data , 2016, International journal of remote sensing.

[15]  Naonori Ueda,et al.  Deep Neural Network Utilizing Remote Sensing Datasets for Flood Hazard Susceptibility Mapping in Brisbane, Australia , 2021, Remote. Sens..

[16]  B. Pham,et al.  Prediction Success of Machine Learning Methods for Flash Flood Susceptibility Mapping in the Tafresh Watershed, Iran , 2019, Sustainability.

[17]  Romulus Costache,et al.  Improvement of Best First Decision Trees Using Bagging and Dagging Ensembles for Flood Probability Mapping , 2020, Water Resources Management.

[18]  Alaa M. Al-Abadi,et al.  Mapping flood susceptibility in an arid region of southern Iraq using ensemble machine learning classifiers: a comparative study , 2018, Arabian Journal of Geosciences.

[19]  Efthymia Nikita The use of generalized linear models and generalized estimating equations in bioarchaeological studies. , 2014, American journal of physical anthropology.

[20]  Kwok-wing Chau,et al.  Flood Prediction Using Machine Learning Models: Literature Review , 2018, Water.

[21]  S A Glantz,et al.  Multiple regression for physiological data analysis: the problem of multicollinearity. , 1985, The American journal of physiology.

[22]  Biswajeet Pradhan,et al.  Novel GIS Based Machine Learning Algorithms for Shallow Landslide Susceptibility Mapping , 2018, Sensors.

[23]  Benjamin Hofner,et al.  Model-based boosting in R: a hands-on tutorial using the R package mboost , 2012, Computational Statistics.

[24]  H. Pourghasemi,et al.  Performance assessment of individual and ensemble data-mining techniques for gully erosion modeling. , 2017, The Science of the total environment.

[25]  Wei Chen,et al.  Flood susceptibility mapping in Dingnan County (China) using adaptive neuro-fuzzy inference system with biogeography based optimization and imperialistic competitive algorithm. , 2019, Journal of environmental management.

[26]  Isabelle Geneau de Lamarlière Progress in Development Studies , 2003 .

[27]  Wei Chen,et al.  Deep learning and boosting framework for piping erosion susceptibility modeling: spatial evaluation of agricultural areas in the semi-arid region , 2021, Geocarto International.

[28]  Subodh Chandra Pal,et al.  Development of Different Machine Learning Ensemble Classifier for Gully Erosion Susceptibility in Gandheswari Watershed of West Bengal, India , 2020 .

[29]  Zohre Sadat Pourtaghi,et al.  Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia , 2015, Landslides.

[30]  S. Pal,et al.  Flood susceptibility mapping by ensemble evidential belief function and binomial logistic regression model on river basin of eastern India , 2020 .

[31]  Moung-Jin Lee,et al.  Application of frequency ratio model and validation for predictive flooded area susceptibility mapping using GIS , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[32]  Mustafa Neamah Jebur,et al.  Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS , 2014 .

[33]  Mark New,et al.  Ensemble forecasting of species distributions. , 2007, Trends in ecology & evolution.

[34]  Dieu Tien Bui,et al.  A novel hybrid artificial intelligence approach for flood susceptibility assessment , 2017, Environ. Model. Softw..

[35]  Wei Chen,et al.  Evaluation of different boosting ensemble machine learning models and novel deep learning and boosting framework for head-cut gully erosion susceptibility. , 2021, Journal of environmental management.

[36]  B. Choubin,et al.  Ensemble models of GLM, FDA, MARS, and RF for flood and erosion susceptibility mapping: a priority assessment of sub-basins , 2020, Geocarto International.

[37]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[38]  H. Pourghasemi,et al.  Identification of Critical Flood Prone Areas in Data-Scarce and Ungauged Regions: A Comparison of Three Data Mining Models , 2017, Water Resources Management.

[39]  Biswajeet Pradhan,et al.  Flash flood susceptibility modelling using functional tree and hybrid ensemble techniques , 2020, Journal of Hydrology.

[40]  Ozgur Kisi,et al.  Applications of hybrid wavelet–Artificial Intelligence models in hydrology: A review , 2014 .

[41]  Sérgio Freire,et al.  GHS built-up grid, derived from Landsat, multitemporal (1975, 1990, 2000, 2014) , 2015 .

[42]  Galina Merkuryeva,et al.  Advanced river flood monitoring, modelling and forecasting , 2015, J. Comput. Sci..

[43]  Gerhard Tutz,et al.  Generalized Linear Mixed Models Based on Boosting , 2010 .

[44]  H. Pourghasemi,et al.  Assessment of the importance of gully erosion effective factors using Boruta algorithm and its spatial modeling and mapping using three machine learning algorithms , 2019, Geoderma.

[45]  Nadhir Al-Ansari,et al.  GIS Based Hybrid Computational Approaches for Flash Flood Susceptibility Assessment , 2020, Water.

[46]  Mehebub Sahana,et al.  A comparison of frequency ratio and fuzzy logic models for flood susceptibility assessment of the lower Kosi River Basin in India , 2019, Environmental Earth Sciences.

[47]  Biswajeet Pradhan,et al.  Modeling flood susceptibility using data-driven approaches of naïve Bayes tree, alternating decision tree, and random forest methods. , 2019, The Science of the total environment.

[48]  J. Adamowski,et al.  An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. , 2019, The Science of the total environment.

[49]  B. Pradhan,et al.  A comparative assessment of flood susceptibility modeling using Multi-Criteria Decision-Making Analysis and Machine Learning Methods , 2019, Journal of Hydrology.

[50]  Amir Mosavi,et al.  Integrated machine learning methods with resampling algorithms for flood susceptibility prediction. , 2019, The Science of the total environment.

[51]  M. Vafakhah,et al.  Evaluating the support vector machine for suspended sediment load forecasting based on gamma test , 2016, Arabian Journal of Geosciences.

[52]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[53]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[54]  Adrienn Dineva,et al.  Evaluation efficiency of hybrid deep learning algorithms with neural network decision tree and boosting methods for predicting groundwater potential , 2021, Geocarto International.

[55]  Amir Mosavi,et al.  Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models: Application of the simulated annealing feature selection method. , 2019, The Science of the total environment.

[56]  S. Pal,et al.  Assessment of groundwater recharge and its potential zone identification in groundwater-stressed Goghat-I block of Hugli District, West Bengal, India , 2019, Environment, Development and Sustainability.

[57]  Himan Shahabi,et al.  Flood susceptibility assessment using integration of adaptive network-based fuzzy inference system (ANFIS) and biogeography-based optimization (BBO) and BAT algorithms (BA) , 2019 .