Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods.

The main objective of the present study was to provide a novel methodological approach for flash flood susceptibility modeling based on a feature selection method (FSM) and tree based ensemble methods. The FSM, used a fuzzy rule based algorithm FURIA, as attribute evaluator, whereas GA were used as the search method, in order to obtain optimal set of variables used in flood susceptibility modeling assessments. The novel FURIA-GA was combined with LogitBoost, Bagging and AdaBoost ensemble algorithms. The performance of the developed methodology was evaluated at the Bao Yen district and the Bac Ha district of Lao Cai Province in the Northeast region of Vietnam. For the case study, 654 floods and twelve geo-environmental variables were used. The predictive performance of each model was estimated through the calculation of the classification accuracy, the sensitivity, the specificity, the success and predictive rate curve and the area under the curves (AUC). The FURIA-GA FSM compared to a conventional rule based method gave more accurate predictive results. Also, the FURIA-GA based models, presented higher learning and predictive ability compared to the ensemble models that had not undergone a FSM. Based on the predictive classification accuracy, FURIA-GA-Bagging (93.37%) outperformed FURIA-GA-LogitBoost (92.35%) and FURIA-GA-AdaBoost (89.03%). FURIA-GA-Bagging showed also the highest sensitivity (96.94%) and specificity (89.80%). On the other hand, the FURIA-GA-LogitBoost showed the lowest percentage in very high susceptible zone and the highest relative flash-flood density, whereas the FURIA-GA-AdaBoost achieved the highest prediction AUC value (0.9740), based on the prediction rate curve, followed by FURIA-GA-Bagging (0.9566), and FURIA-GA-LogitBoost (0.8955). It can be concluded that the usage of different statistical metrics, provides different outcomes concerning the best prediction model, which mainly could be attributed to sites specific settings. The proposed models could be considered as a novel alternative investigation tools appropriate for flash flood susceptibility mapping.

[1]  B. Pradhan,et al.  Application of GIS based data driven evidential belief function model to predict groundwater potential zonation , 2014 .

[2]  Bofu Yu,et al.  Integrated application of the analytic hierarchy process and the geographic information system for flood risk assessment and flood plain management in Taiwan , 2011 .

[3]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[4]  V. Singh,et al.  Novel Hybrid Evolutionary Algorithms for Spatial Prediction of Floods , 2018, Scientific Reports.

[5]  Randy L. Haupt,et al.  Practical Genetic Algorithms , 1998 .

[6]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[7]  Jerry R. Miller,et al.  Morphometric assessment of lithologic controls on drainage basin evolution in the Crawford Upland, south-central Indiana , 1990 .

[8]  Vu Ngoc Chau,et al.  Economic impact upon agricultural production from extreme flood events in Quang Nam, central Vietnam , 2014, Natural Hazards.

[9]  D. Fernández,et al.  Urban flood hazard zoning in Tucumán Province, Argentina, using GIS and multicriteria decision analysis , 2010 .

[10]  Cha Zhang,et al.  Ensemble Machine Learning: Methods and Applications , 2012 .

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  A. Zhu,et al.  Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China. , 2018, The Science of the total environment.

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  B. Pham,et al.  A comparative study of sequential minimal optimization-based support vector machines, vote feature intervals, and logistic regression in landslide susceptibility assessment using GIS , 2017, Environmental Earth Sciences.

[16]  A. R. Mahmud,et al.  An artificial neural network model for flood simulation using GIS: Johor River Basin, Malaysia , 2012, Environmental Earth Sciences.

[17]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[18]  Dieu Tien Bui,et al.  Improving Accuracy Estimation of Forest Aboveground Biomass Based on Incorporation of ALOS-2 PALSAR-2 and Sentinel-2A Imagery and Machine Learning: A Case Study of the Hyrcanian Forest Area (Iran) , 2018, Remote. Sens..

[19]  Wei Chen,et al.  Hybrid Integration Approach of Entropy with Logistic Regression and Support Vector Machine for Landslide Susceptibility Modeling , 2018, Entropy.

[20]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[21]  Wei Chen,et al.  GIS-based spatial prediction of flood prone areas using standalone frequency ratio, logistic regression, weight of evidence and their ensemble techniques , 2017 .

[22]  Indra Prakash,et al.  Landslide Hazard Assessment Using Random SubSpace Fuzzy Rules Based Classifier Ensemble and Probability Analysis of Rainfall Data: A Case Study at Mu Cang Chai District, Yen Bai Province (Viet Nam) , 2017, Journal of the Indian Society of Remote Sensing.

[23]  Kurt Hornik,et al.  Open-source machine learning: R meets Weka , 2009, Comput. Stat..

[24]  Martin Kappas,et al.  Flash Flood Prediction by Coupling KINEROS2 and HEC-RAS Models for Tropical Regions of Northern Vietnam , 2015 .

[25]  G. Karatzas,et al.  A national scale flood hazard mapping methodology: The case of Greece - Protection and adaptation policy approaches. , 2017, The Science of the total environment.

[26]  P. E. O'connell,et al.  An introduction to the European Hydrological System — Systeme Hydrologique Europeen, “SHE”, 1: History and philosophy of a physically-based, distributed modelling system , 1986 .

[27]  B. Razafindrabe,et al.  Analyzing flood risk and related impacts to urban communities in central Vietnam , 2014, Mitigation and Adaptation Strategies for Global Change.

[28]  Bing Yang,et al.  Flood risk zoning using a rule mining based on ant colony algorithm , 2016 .

[29]  A-Xing Zhu,et al.  Flood susceptibility assessment in Hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. , 2018, The Science of the total environment.

[30]  H. Pourghasemi,et al.  A GIS-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio and weights-of-evidence bivariate statistical models with multi-criteria decision-making technique , 2016, Natural Hazards.

[31]  Hyung-Sup Jung,et al.  Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea , 2017 .

[32]  Maurizio Mazzoleni,et al.  Flooding hazard mapping in floodplain areas affected by piping breaches in the Po River, Italy , 2014 .

[33]  R. Fealy,et al.  Assessing the Impact of Climate Change on Water Supply and Flood Hazard in Ireland Using Statistical Downscaling and Hydrological Modelling Techniques , 2006 .

[34]  Salvatore Manfreda,et al.  Detection of Flood-Prone Areas Using Digital Elevation Models , 2011 .

[35]  Saro Lee,et al.  Application of Ensemble-Based Machine Learning Models to Landslide Susceptibility Mapping , 2018, Remote. Sens..

[36]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[37]  B. Pradhan,et al.  A Comparative Assessment Between the Application of Fuzzy Unordered Rules Induction Algorithm and J48 Decision Tree Models in Spatial Prediction of Shallow Landslides at Lang Son City, Vietnam , 2014 .

[38]  Wei Chen,et al.  Applying population-based evolutionary algorithms and a neuro-fuzzy system for modeling landslide susceptibility , 2019, CATENA.

[39]  Moung-Jin Lee,et al.  Application of frequency ratio model and validation for predictive flooded area susceptibility mapping using GIS , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[40]  B. Pradhan Flood susceptible mapping and risk area delineation using logistic regression, GIS and remote sensing , 2010 .

[41]  Pijush Samui,et al.  A Novel Hybrid Swarm Optimized Multilayer Neural Network for Spatial Prediction of Flash Floods in Tropical Areas Using Sentinel-1 SAR Imagery and Geospatial Data , 2018, Sensors.

[42]  Omid Rahmati,et al.  Flood hazard zoning in Yasooj region, Iran, using GIS and multi-criteria decision analysis , 2016 .

[43]  H. Pourghasemi,et al.  Flood susceptibility mapping using frequency ratio and weights-of-evidence models in the Golastan Province, Iran , 2016 .

[44]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[45]  T. Kavzoglu,et al.  Selecting optimal conditioning factors in shallow translational landslide susceptibility mapping using genetic algorithm , 2015 .

[46]  H. Pourghasemi,et al.  Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. , 2018, The Science of the total environment.

[47]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[48]  Kris A. Johnson,et al.  Validation of a 30 m resolution flood hazard model of the conterminous United States , 2017 .

[49]  M. Diakakis,et al.  Floods in Greece, a statistical and spatial approach , 2012, Natural Hazards.

[50]  Chao Zhou,et al.  Comprehensive flood risk assessment based on set pair analysis-variable fuzzy sets model and fuzzy AHP , 2013, Stochastic Environmental Research and Risk Assessment.

[51]  Hui Li,et al.  A new hybrid data-driven model for event-based rainfall–runoff simulation , 2017, Neural Computing and Applications.

[52]  Dieu Tien Bui,et al.  A novel hybrid artificial intelligence approach for flood susceptibility assessment , 2017, Environ. Model. Softw..

[53]  Alaa M. Al-Abadi,et al.  Mapping flood susceptibility in an arid region of southern Iraq using ensemble machine learning classifiers: a comparative study , 2018, Arabian Journal of Geosciences.

[54]  Biswajeet Pradhan,et al.  Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using GIS , 2016 .

[55]  B. Pradhan,et al.  Probabilistic landslide hazards and risk mapping on Penang Island, Malaysia , 2006 .

[56]  S. Stefanidis,et al.  Assessment of flood hazard based on natural and anthropogenic factors using analytic hierarchy process (AHP) , 2013, Natural Hazards.

[57]  C. Keller,et al.  Multivariate interpolation to incorporate thematic surface data using inverse distance weighting (IDW) , 1996 .

[58]  Quoc-Phi Nguyen,et al.  A novel fuzzy K-nearest neighbor inference model with differential evolution for spatial prediction of rainfall-induced shallow landslides in a tropical hilly area using GIS , 2017, Landslides.

[59]  Mahyat Shafapour Tehrany,et al.  Flood susceptibility assessment using GIS-based support vector machine model with different kernel types , 2015 .

[60]  Bryan C. Pijanowski,et al.  The impact of urban development on hydrologic regime from catchment to basin scales , 2011 .

[61]  Santosh K. Aryal,et al.  The concept of effective length in hillslopes: assessing the influence of climate and topography on the contributing areas of catchments , 2003 .

[62]  C. Gokceoğlu,et al.  Assessment of landslide susceptibility for a landslide-prone area (north of Yenice, NW Turkey) by fuzzy approach , 2002 .

[63]  Abdul Halim Ghazali,et al.  Ensemble machine-learning-based geospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS , 2017 .

[64]  Taghi M. Khoshgoftaar,et al.  Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[65]  Hamid Reza Pourghasemi,et al.  Assessment of Landslide-Prone Areas and Their Zonation Using Logistic Regression, LogitBoost, and NaïveBayes Machine-Learning Algorithms , 2018, Sustainability.

[66]  Omid Rahmati,et al.  Spatial prediction of flood-susceptible areas using frequency ratio and maximum entropy models , 2018 .

[67]  Terrence Fong,et al.  Automatic boosted flood mapping from satellite data , 2016, International journal of remote sensing.

[68]  Mustafa Neamah Jebur,et al.  Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS , 2013 .

[69]  Kenneth Gavin,et al.  Development of a landslide susceptibility assessment for a rail network , 2016 .

[70]  Qun Liu,et al.  Bagging-based System Combination for Domain Adaption , 2011, Machine Translation Summit.

[71]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[72]  João Miguel da Costa Sousa,et al.  Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients , 2013, Appl. Soft Comput..

[73]  A. Zhu,et al.  Applying genetic algorithms to set the optimal combination of forest fire related variables and model forest fire susceptibility based on data mining models. The case of Dayu County, China. , 2018, The Science of the total environment.

[74]  B. Pham,et al.  A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. , 2018, The Science of the total environment.

[75]  B. Pradhan,et al.  GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost ensemble frameworks , 2016, Environmental Earth Sciences.

[76]  A. Soualmia,et al.  Comparison of 1 D and 2 D Hydraulic Models for Floods Simulation on the Medjerda Riverin Tunisia , 2016 .

[77]  D. Barrett,et al.  Assessing spatial likelihood of flooding hazard using naïve Bayes and GIS: a case study in Bowen Basin, Australia , 2016, Stochastic Environmental Research and Risk Assessment.

[78]  T. Blaschke,et al.  GIS-multicriteria decision analysis for landslide susceptibility mapping: comparing three methods for the Urmia lake basin, Iran , 2012, Natural Hazards.

[79]  Nazzareno Pierdicca,et al.  An algorithm for operational flood mapping from Synthetic Aperture Radar (SAR) data using fuzzy logic , 2011 .

[80]  Teuvo Kohonen,et al.  Learning vector quantization , 1998 .

[81]  Graciela Metternicht,et al.  Mapping and modelling mass movements and gullies in mountainous areas using remote sensing and GIS techniques , 2001 .

[82]  Eyke Hüllermeier,et al.  FURIA: an algorithm for unordered fuzzy rule induction , 2009, Data Mining and Knowledge Discovery.

[83]  Jong-Sen Lee,et al.  Digital Image Enhancement and Noise Filtering by Use of Local Statistics , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Wei Chen,et al.  Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling. , 2018, The Science of the total environment.

[85]  Matías Gámez,et al.  adabag: An R Package for Classification with Boosting and Bagging , 2013 .

[86]  Saro Lee,et al.  Enhancing Prediction Performance of Landslide Susceptibility Model Using Hybrid Machine Learning Approach of Bagging Ensemble and Logistic Model Tree , 2018, Applied Sciences.

[87]  G. S. Dwarakish,et al.  A Review on Hydrological Models , 2015 .

[88]  H. Shahabi,et al.  Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping. , 2018, Journal of environmental management.

[89]  Rajib Shaw,et al.  Critical Issues of Community Based Flood Mitigation: Examples from Bangladesh and Vietnam , 2006 .

[90]  Beatriz Souza Leite Pires de Lima,et al.  A COMPARATIVE STUDY APPLIED TO RISERS OPTIMIZATION USING BIO-INSPIRED ALGORITHMS , 2009 .

[91]  Tomàs Margalef,et al.  Evolutionary Optimisation Techniques to Estimate Input Parameters in Environmental Emergency Modelling , 2011, Computational Optimization and Applications in Engineering and Industry.

[92]  N. Seçkin,et al.  Comparison of Artificial Neural Network Methods with L-moments for Estimating Flood Flow at Ungauged Sites: the Case of East Mediterranean River Basin, Turkey , 2013, Water Resources Management.

[93]  K. Beven,et al.  Testing a physically-based flood forecasting model (TOPMODEL) for three U.K. catchments , 1984 .

[94]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[95]  Himan Shahabi,et al.  Flood susceptibility assessment using integration of adaptive network-based fuzzy inference system (ANFIS) and biogeography-based optimization (BBO) and BAT algorithms (BA) , 2019 .

[96]  Too Big to Fail? The Spatial Vulnerability of the Chinese Infrastructure System to Flooding Risks , 2014 .

[97]  B. Pradhan,et al.  A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility , 2017 .

[98]  Andrea G. Fabbri,et al.  Validation of Spatial Prediction Models for Landslide Hazard Mapping , 2003 .

[99]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[100]  T. Topal,et al.  GIS-based landslide susceptibility mapping for a problematic segment of the natural gas pipeline, Hendek (Turkey) , 2003 .

[101]  V. Singh,et al.  New Hybrids of ANFIS with Several Optimization Algorithms for Flood Susceptibility Modeling , 2018, Water.

[102]  N. Arnell,et al.  The impacts of climate change on river flood risk at the global scale , 2016, Climatic Change.

[103]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[104]  D. Bui,et al.  Landslide susceptibility modeling using Reduced Error Pruning Trees and different ensemble techniques: Hybrid machine learning approaches , 2019, CATENA.

[105]  Mustafa Neamah Jebur,et al.  Flood susceptibility analysis and its verification using a novel ensemble support vector machine and frequency ratio method , 2015, Stochastic Environmental Research and Risk Assessment.

[106]  Biswajeet Pradhan,et al.  Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree , 2016, Landslides.

[107]  Oscar Cordón,et al.  On Designing Fuzzy Rule-Based Multiclassification Systems by Combining Furia with Bagging and Feature Selection , 2011, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[108]  J. Hair Multivariate data analysis , 1972 .

[109]  M. Zasada,et al.  Applying geostatistics for investigations of forest ecosystems using remote sensing imagery , 2005 .

[110]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[111]  Cha Zhang,et al.  Ensemble Machine Learning , 2012 .

[112]  Jeffery S. Horsburgh,et al.  Evaluating the simulation times and mass balance errors of component-based models: An application of OpenMI 2.0 to an urban stormwater system , 2015, Environ. Model. Softw..

[113]  A. Pradhan,et al.  Relative effect method of landslide susceptibility zonation in weathered granite soil: a case study in Deokjeok-ri Creek, South Korea , 2014, Natural Hazards.

[114]  N. Raghuwanshi,et al.  Flood Estimation by GIUH-Based Clark and Nash Models , 2006 .

[115]  G. Karatzas,et al.  Flood management and a GIS modelling method to assess flood-hazard areas—a case study , 2011 .

[116]  Baihua Fu,et al.  Riparian vegetation NDVI dynamics and its relationship with climate, surface water and groundwater , 2015 .

[117]  B. Pradhan,et al.  Regional landslide susceptibility analysis using back-propagation neural network model at Cameron Highland, Malaysia , 2010 .

[118]  S. Menard Applied Logistic Regression Analysis , 1996 .

[119]  N. Kazakis,et al.  Assessment of flood hazard areas at a regional scale using an index-based approach and Analytical Hierarchy Process: Application in Rhodope-Evros region, Greece. , 2015, The Science of the total environment.

[120]  S D Walter,et al.  The partial area under the summary ROC curve , 2005, Statistics in medicine.

[121]  Yi Liu,et al.  Spatial and temporal changes in flooding and the affecting factors in China , 2012, Natural Hazards.

[122]  Mustafa Neamah Jebur,et al.  Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS , 2014 .

[123]  Colin Arrowsmith,et al.  Topography Wetness Index Application in Flood-Risk-Based Land Use Planning , 2016 .

[124]  P. Nagler,et al.  Roles of saltcedar (Tamarix spp.) and capillary rise in salinizing a non-flooding terrace on a flow-regulated desert river , 2012 .

[125]  Chong-yu Xu,et al.  The changing patterns of floods in Poyang Lake, China: characteristics and explanations , 2015, Natural Hazards.

[126]  A. K. Lohani,et al.  Hydrological time series modeling: A comparison between adaptive neuro-fuzzy, neural network and autoregressive techniques , 2012 .

[127]  Mehdi Nikoo,et al.  Flood-routing modeling with neural network optimized by social-based algorithm , 2016, Natural Hazards.

[128]  Mustafa Neamah Jebur,et al.  Flood susceptibility mapping using integrated bivariate and multivariate statistical models , 2014, Environmental Earth Sciences.

[129]  C. Cao,et al.  Flash Flood Hazard Susceptibility Mapping Using Frequency Ratio and Statistical Index Methods in Coalmine Subsidence Areas , 2016 .