Flood susceptibility assessment based on a novel random Naïve Bayes method: A comparison between different factor discretization methods

Abstract Random Naive Bayes (RNB) is a machine learning method that uses the Random Forest (RF) structure to optimize Naive Bayes (NB). It is interesting to see whether RNB could optimize NB and achieve satisfied assessment results like RF in the flood susceptibility assessment study. RNB has rarely been used in study of using machine learning methods to spatially analyze natural disasters, and thus it was selected as the analysis method. Based on the data feasibility, 12 spatial factors that affect the occurrence and spatial distribution of floods were selected. To avoid the influence of subjective equal-interval classification method, natural breaks and quantile method were used to discretize factors with continuous values, respectively. Here, a recently proposed repeatedly random sampling method was adopted to select negative samples for RNB to generate a most accurate classifier (MAC) that was employed to compute the probability of flood occurrence in the study area. Consequently, this paper adopted the integrated framework of GIS and RNB to spatially assess the flood susceptibility using the Wanan County in China as an instance. The results demonstrated that when integrated with the repeatedly random sampling method, the MAC-based flood susceptibility maps corresponding to different factor discretization methods were similar, meaning this framework can effectively avoid the effects caused by different factor discretization methods. Also, to testify the classification performance of RNB, RF and NB were chosen to compare the classification performance with it. The results indicated the classification performance in the order of RF > RNB > NB. This means RNB is able to achieve better classification performance than NB, but it exists limitations when compared with traditional strong classifiers like RF. The findings of this paper proved that RNB is a feasible approach for natural hazard susceptibility assessment.

[1]  Rosa F. Ropero,et al.  Groundwater quality assessment using data clustering based on hybrid Bayesian networks , 2013, Stochastic Environmental Research and Risk Assessment.

[2]  Mikhail Kanevski,et al.  Machine Learning Feature Selection Methods for Landslide Susceptibility Mapping , 2013, Mathematical Geosciences.

[3]  Erhan Şener,et al.  Assessment of aquifer vulnerability based on GIS and DRASTIC methods: a case study of the Senirkent-Uluborlu Basin (Isparta, Turkey) , 2009 .

[4]  S. Stefanidis,et al.  Assessment of flood hazard based on natural and anthropogenic factors using analytic hierarchy process (AHP) , 2013, Natural Hazards.

[5]  Seyed Amir Naghibi,et al.  GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran , 2015, Environmental Monitoring and Assessment.

[6]  Zohre Sadat Pourtaghi,et al.  GIS-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran , 2016, Environmental Earth Sciences.

[7]  H. Pourghasemi,et al.  A GIS-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio and weights-of-evidence bivariate statistical models with multi-criteria decision-making technique , 2016, Natural Hazards.

[8]  D. Bui,et al.  Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees , 2018 .

[9]  Wei Chen,et al.  Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with bagging and random subspace ensembles , 2019, Journal of Hydrology.

[10]  Mahyat Shafapour Tehrany,et al.  Flood susceptibility assessment using GIS-based support vector machine model with different kernel types , 2015 .

[11]  Wei Chen,et al.  Flood susceptibility mapping in Dingnan County (China) using adaptive neuro-fuzzy inference system with biogeography based optimization and imperialistic competitive algorithm. , 2019, Journal of environmental management.

[12]  Mustafa Neamah Jebur,et al.  Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS , 2013 .

[13]  Seyed Amir Naghibi,et al.  A comparative assessment of GIS-based data mining models and a novel ensemble model in groundwater well potential mapping , 2017 .

[14]  Dirk Van den Poel,et al.  Random Multiclass Classification: Generalizing Random Forests to Random MNL and Random NB , 2007, DEXA.

[15]  Craig A. Stow,et al.  Comparative analysis of discretization methods in Bayesian networks , 2017, Environ. Model. Softw..

[16]  Biswajeet Pradhan,et al.  Modeling flood susceptibility using data-driven approaches of naïve Bayes tree, alternating decision tree, and random forest methods. , 2019, The Science of the total environment.

[17]  B. Pradhan,et al.  A comparative assessment of flood susceptibility modeling using Multi-Criteria Decision-Making Analysis and Machine Learning Methods , 2019, Journal of Hydrology.

[18]  H. Hong,et al.  Predicting spatial patterns of wildfire susceptibility in the Huichang County, China: An integrated model to analysis of landscape indicators , 2019, Ecological Indicators.

[19]  A. Zhu,et al.  Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China. , 2018, The Science of the total environment.

[20]  Xin Huang,et al.  Flood hazard in Hunan province of China: an economic loss analysis , 2008 .

[21]  A-Xing Zhu,et al.  Landslide susceptibility assessment in the Anfu County, China: comparing different statistical and probabilistic models considering the new topo-hydrological factor (HAND) , 2018, Earth Science Informatics.

[22]  Zhiqiang Jiang,et al.  Runoff forecast uncertainty considered load adjustment model of cascade hydropower stations and its application , 2018, Energy.

[23]  L. Ayalew,et al.  The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan , 2005 .

[24]  Mustafa Neamah Jebur,et al.  Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS , 2014 .

[25]  Maurizio Mazzoleni,et al.  Flooding hazard mapping in floodplain areas affected by piping breaches in the Po River, Italy , 2014 .

[26]  Frank Scherbaum,et al.  Bayesian network learning for natural hazard analyses , 2014 .

[27]  Nhat-Duc Hoang,et al.  Spatial prediction of rainfall-induced shallow landslides using hybrid integration approach of Least-Squares Support Vector Machines and differential evolution optimization: a case study in Central Vietnam , 2016, Int. J. Digit. Earth.

[28]  Serena H. Chen,et al.  Good practice in Bayesian network modelling , 2012, Environ. Model. Softw..

[29]  Wei Chen,et al.  A Hybrid GIS Multi-Criteria Decision-Making Method for Flood Susceptibility Mapping at Shangyou, China , 2018, Remote. Sens..

[30]  Quoc Bao Pham,et al.  Comparative assessment of the flash-flood potential within small mountain catchments using bivariate statistics and their novel hybrid integration with machine learning models. , 2019, The Science of the total environment.

[31]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[32]  Yaolong Zhao,et al.  A spatial assessment of urban waterlogging risk based on a Weighted Naïve Bayes classifier. , 2018, The Science of the total environment.

[33]  Yi Wang,et al.  Identification of torrential valleys using GIS and a novel hybrid integration of artificial intelligence, machine learning and bivariate statistics , 2019 .

[34]  Futao Guo,et al.  What drives forest fire in Fujian, China? Evidence from logistic regression and Random Forests , 2016 .

[35]  A. Zhu,et al.  Landslide susceptibility evaluating using artificial intelligence method in the Youfang district (China) , 2019, Environmental Earth Sciences.

[36]  H. Pourghasemi,et al.  Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: A case study at Mehran Region, Iran , 2016 .

[37]  Mustafa Neamah Jebur,et al.  Landslide susceptibility mapping using ensemble bivariate and multivariate statistical models in Fayfa area, Saudi Arabia , 2015, Environmental Earth Sciences.

[38]  Jinfeng Wang,et al.  Optimal discretization for geographical detectors-based risk assessment , 2013 .

[39]  Ranjana Sodhi,et al.  A Modified S-Transform and Random Forests-Based Power Quality Assessment Framework , 2018, IEEE Transactions on Instrumentation and Measurement.

[40]  A-Xing Zhu,et al.  Comparison of the presence-only method and presence-absence method in landslide susceptibility mapping , 2018, CATENA.

[41]  J. Grzybowski,et al.  Artificial neural network ensembles applied to the mapping of landslide susceptibility , 2020 .

[42]  Michael B. Smith,et al.  A gis‐based distributed parameter hydrologic model for urban areas , 1993 .

[43]  H. S. Kutoglu,et al.  Landslide susceptibility mapping in an area of underground mining using the multicriteria decision analysis method , 2018, Environmental Monitoring and Assessment.

[44]  Han-Xiong Li,et al.  Probabilistic support vector machines for classification of noise affected data , 2013, Inf. Sci..

[45]  B. Pham,et al.  Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan. , 2019, The Science of the total environment.

[46]  G. Chander,et al.  Intra‐annual NDVI validation of the Landsat 5 TM radiometric calibration , 2009 .

[47]  Estevam R. Hruschka,et al.  Using Bayesian networks with rule extraction to infer the risk of weed infestation in a corn-crop , 2009, Eng. Appl. Artif. Intell..

[48]  Jianzhong Zhou,et al.  Credibility theory based panoramic fuzzy risk analysis of hydropower station operation near the boundary , 2018, Journal of Hydrology.

[49]  Thong Ngee Goh,et al.  Adaptive ridge regression system for software cost estimating on multi-collinear datasets , 2010, J. Syst. Softw..

[50]  Ling Kang,et al.  Research on application of cross structure flood risk assessment decision support system using Bayesian Network , 2010, 2010 2nd IEEE International Conference on Information Management and Engineering.

[51]  Yi Wang,et al.  Flood susceptibility mapping using convolutional neural network frameworks , 2020 .

[52]  Hamid Reza Pourghasemi,et al.  Erratum to: Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia , 2016, Landslides.

[53]  Lalit Kumar,et al.  Impact of local slope and aspect assessed from LiDAR records on tree diameter in radiata pine (Pinus radiata D. Don) plantations , 2014, Annals of Forest Science.

[54]  A-Xing Zhu,et al.  Flood susceptibility assessment in Hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. , 2018, The Science of the total environment.

[55]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[56]  A-Xing Zhu,et al.  Modeling landslide susceptibility using LogitBoost alternating decision trees and forest by penalizing attributes with the bagging ensemble. , 2020, The Science of the total environment.

[57]  Xiaohong Chen,et al.  Flood hazard risk assessment model based on random forest , 2015 .

[58]  D. Bui,et al.  Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression , 2011 .

[59]  Adrienne Grêt-Regamey,et al.  Spatially explicit avalanche risk assessment linking Bayesian networks to a GIS , 2006 .

[60]  Binh Thai Pham,et al.  Wildfire spatial pattern analysis in the Zagros Mountains, Iran: A comparative study of decision tree based classifiers , 2018, Ecol. Informatics.

[61]  Baozhu Pan,et al.  An urban storm-inundation simulation method based on GIS , 2014 .

[62]  Wei Liu,et al.  Urban waterlogging susceptibility assessment based on a PSO-SVM method using a novel repeatedly random sampling idea to select negative samples , 2019, Journal of Hydrology.

[63]  Dragan Savic,et al.  Multi-layered coarse grid modelling in 2D urban flood simulations , 2012 .

[64]  N. Kazakis,et al.  Assessment of flood hazard areas at a regional scale using an index-based approach and Analytical Hierarchy Process: Application in Rhodope-Evros region, Greece. , 2015, The Science of the total environment.

[65]  Chong Xu,et al.  GIS-based support vector machine modeling of earthquake-triggered landslide susceptibility in the Jianjiang River watershed, China , 2012 .

[66]  H. Hong,et al.  Exploring effectiveness of frequency ratio and support vector machine models in storm surge flood susceptibility assessment: A study of Sundarban Biosphere Reserve, India , 2020 .

[67]  Wisdom M. Dlamini,et al.  A Bayesian belief network analysis of factors influencing wildfire occurrence in Swaziland , 2010, Environ. Model. Softw..