Robustness analysis of machine learning classifiers in predicting spatial gully erosion susceptibility with altered training samples

Abstract The present research intended to assess the robustness of three popular machine learning models, i.e. random forest (RF), boosted regression tree (BRT) and naïve bayes (NB) in spatial gully erosion susceptibility modelling in Jainti River basin, India. A gully inventory map of 208 gullies was prepared through field survey and Google earth imageries. Following the 70/30 ratio, three randomly sampled groups of altered training and validation gully sets G1, G2 and G3 were prepared for modelling gully erosion susceptibility. Using information gain ratio and multi-collinearity analysis, 14 gully conditioning factors (GCF) were selected. The discrimination ability and reliability of the models were measured through Kappa coefficient, efficiency, receiver operating characteristic curve, root-mean-square-error (RMSE) and mean-absolute-error (MAE). The stability of the machine learning models was estimated by comparing the accuracy statistics and the departure in areal outcomes among intra-model and inter-model. RF model was found as the most consistent. With the highest mean AUC (0.903), efficiency (91.17), Kappa coefficient (0.835) and lowest RMSE (0.192) and MAE (0.081), RF was found to be more consistent when the training and validation data sets were altered. The effectiveness of each input GCFs was determined using map removal sensitivity analysis technique. This study could be supportive in ascertaining model deployment for mapping gully erosion and managing the land resource.

[1]  Naomi S. Altman,et al.  Points of Significance: Classification evaluation , 2016, Nature Methods.

[2]  Karel Vandaele,et al.  Gully Erosion: Importance and Model Implications , 1998 .

[3]  S. Padmavathi Applying Naive Bayes Data Mining Technique for Classification of Agricultural Land Soils , 2009 .

[4]  G. R. Foster,et al.  Predicting soil erosion by water : a guide to conservation planning with the Revised Universal Soil Loss Equation (RUSLE) , 1997 .

[5]  S. K. Abdul Rahaman,et al.  Prioritization of Sub Watershed Based on Morphometric Characteristics Using Fuzzy Analytical Hierarchy Process and Geographical Information System – A Study of Kallar Watershed, Tamil Nadu , 2015 .

[6]  H. Pourghasemi,et al.  Prediction of the landslide susceptibility: Which algorithm, which precision? , 2018 .

[7]  P. Reichenbach,et al.  Estimating the quality of landslide susceptibility models , 2006 .

[8]  B. Pham,et al.  Landslide susceptibility assesssment in the Uttarakhand area (India) using GIS: a comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees methods , 2017, Theoretical and Applied Climatology.

[9]  R. Nagarajan,et al.  Landslide hazard susceptibility mapping based on terrain and climatic factors for tropical monsoon regions , 2000 .

[10]  J. Poesen,et al.  Gully erosion and environmental change: importance and research needs , 2003 .

[11]  Subodh Chandra Pal,et al.  Assessing the Importance of Static and Dynamic Causative Factors on Erosion Potentiality Using SWAT, EBF with Uncertainty and Plausibility, Logistic Regression and Novel Ensemble Model in a Sub-tropical Environment , 2020, Journal of the Indian Society of Remote Sensing.

[12]  Tusar Kanti Hembram,et al.  Spatial prediction of susceptibility to gully erosion in Jainti River basin, Eastern India: a comparison of information value and logistic regression models , 2018, Modeling Earth Systems and Environment.

[13]  Baoyuan Liu,et al.  Development of gullies and sediment production in the black soil region of northeastern China , 2008 .

[14]  P. Ekholm,et al.  Does control of soil erosion inhibit aquatic eutrophication? , 2012, Journal of environmental management.

[15]  A. El-Shafie,et al.  Daily Forecasting of Dam Water Levels: Comparing a Support Vector Machine (SVM) Model With Adaptive Neuro Fuzzy Inference System (ANFIS) , 2013, Water Resources Management.

[16]  P. Kuhnert,et al.  Incorporating uncertainty in gully erosion calculations using the random forests modelling approach , 2009 .

[17]  Jagabandhu Roy,et al.  GIS-based Gully Erosion Susceptibility Evaluation Using Frequency Ratio, Cosine Amplitude and Logistic Regression Ensembled with fuzzy logic in Hinglo River Basin, India , 2019, Remote Sensing Applications: Society and Environment.

[18]  Subodh Chandra Pal,et al.  Development of Different Machine Learning Ensemble Classifier for Gully Erosion Susceptibility in Gandheswari Watershed of West Bengal, India , 2020 .

[19]  Biswajeet Pradhan,et al.  Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallow landslides in a tropical hilly area , 2011, Comput. Geosci..

[20]  H. Pourghasemi,et al.  Gully Erosion Modeling Using GIS-Based Data Mining Techniques in Northern Iran: A Comparison Between Boosted Regression Tree and Multivariate Adaptive Regression Spline , 2018, Advances in Natural and Technological Hazards Research.

[21]  Andrea G. Fabbri,et al.  Validation of Spatial Prediction Models for Landslide Hazard Mapping , 2003 .

[22]  R. Cruse,et al.  Reservoir Sedimentation and Upstream Sediment Sources: Perspectives and Future Research Needs on Streambank and Gully Erosion , 2016, Environmental Management.

[23]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[24]  Serap Durmaz,et al.  Landslide inventory of northwestern Anatolia, Turkey , 2005 .

[25]  J. S. Aber,et al.  Geomorphology , 2019, Small-Format Aerial Photography and UAS Imagery.

[26]  Thomas Blaschke,et al.  Machine Learning-Based Gully Erosion Susceptibility Mapping: A Case Study of Eastern India , 2020, Sensors.

[27]  H. A. Nefeslioglu,et al.  An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for the preparation of landslide susceptibility maps , 2008 .

[28]  Guy S. Boggs,et al.  Timing and causes of gully erosion in the riparian zone of the semi-arid tropical Victoria River, Australia: Management implications , 2016 .

[29]  Hamid Reza Pourghasemi,et al.  Identification of soil erosion-susceptible areas using fuzzy logic and analytical hierarchy process modeling in an agricultural watershed of Burdwan district, India , 2019, Environmental Earth Sciences.

[30]  Sanat Kumar Guchhait,et al.  Characterization and evolution of primary and secondary laterites in northwestern Bengal Basin, West Bengal, India , 2015 .

[31]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[32]  Saro Lee,et al.  Validation of an artificial neural network model for landslide susceptibility mapping , 2010 .

[33]  Dong-Sheng Cao,et al.  The boosting: A new idea of building models , 2010 .

[34]  Henrique N. Cabral,et al.  Predicting fish species richness in estuaries: Which modelling technique to use? , 2015, Environ. Model. Softw..

[35]  Stefano Tarantola,et al.  Trends in sensitivity analysis practice in the last decade. , 2016, The Science of the total environment.

[36]  P. Jaccard Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines , 1901 .

[37]  Boris Schröder,et al.  How can statistical models help to determine driving factors of landslides , 2012 .

[38]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[39]  B. Pradhan,et al.  Integrating multilayer perceptron neural nets with hybrid ensemble classifiers for deforestation probability assessment in Eastern India , 2020, Geomatics, Natural Hazards and Risk.

[40]  Hamid Reza Pourghasemi,et al.  Spatial Modelling of Gully Erosion Using GIS and R Programing: A Comparison among Three Data Mining Algorithms , 2018, Applied Sciences.

[41]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[42]  Mahyat Shafapour Tehrany,et al.  Flood susceptibility assessment using GIS-based support vector machine model with different kernel types , 2015 .

[43]  Pablo J. Zarco-Tejada,et al.  The normalized topographic method: an automated procedure for gully mapping using GIS , 2014 .

[44]  P. Reichenbach,et al.  Probabilistic landslide hazard assessment at the basin scale , 2005 .

[45]  Simon Ferrier,et al.  Evaluating the predictive performance of habitat models developed using logistic regression , 2000 .

[46]  Mamun Bin Ibne Reaz,et al.  A novel weighted support vector machines multiclass classifier based on differential evolution for intrusion detection systems , 2017, Inf. Sci..

[47]  Tian Yingjie,et al.  Analysis of soil erosion characteristics in small watersheds with particle swarm optimization, support vector machine, and artificial neuronal networks , 2010 .

[48]  Swades Pal,et al.  Assessing gully erosion susceptibility in Mayurakshi river basin of eastern India , 2018, Environment, Development and Sustainability.

[49]  A. Trigila,et al.  Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy) , 2015 .

[50]  J. Poesen,et al.  Contribution of gully erosion to sediment production in cultivated lands and rangelands , 1996 .

[51]  Siti Nor Maizah Saad,et al.  The Influence of Deforestation on Land Surface Temperature—A Case Study of Perak and Kedah, Malaysia , 2020, Forests.

[52]  Mikhail Kanevski,et al.  Machine Learning Feature Selection Methods for Landslide Susceptibility Mapping , 2013, Mathematical Geosciences.

[53]  Michael Märker,et al.  A GIS-based approach for gully erosion susceptibility modelling: a test in Sicily, Italy , 2013, Environmental Earth Sciences.

[54]  Tzu-Tsung Wong,et al.  A hybrid discretization method for naïve Bayesian classifiers , 2012, Pattern Recognit..

[55]  Aini Hussain,et al.  Erratum to: Daily Forecasting of Dam Water Levels: Comparing a Support Vector Machine (SVM) Model With Adaptive Neuro Fuzzy Inference System (ANFIS) , 2013, Water Resources Management.

[56]  Andrew D. Weiss Topographic position and landforms analysis , 2001 .

[57]  D. Bui,et al.  Uncertainties of prediction accuracy in shallow landslide modeling: Sample size and raster resolution , 2019, CATENA.

[58]  Dieu Tien Bui,et al.  A novel hybrid artificial intelligence approach for flood susceptibility assessment , 2017, Environ. Model. Softw..

[59]  E. Roose,et al.  Land Husbandry: Components and Strategy , 1997 .

[60]  E. Rotigliano,et al.  Gully erosion susceptibility assessment by means of GIS-based logistic regression: A case of Sicily (Italy) , 2014 .

[61]  S. Moretti,et al.  Gully erosion modelling and landscape response in the Mbuluzi River catchment of Swaziland , 2003 .

[62]  Himan Shahabi,et al.  Landslide spatial modelling using novel bivariate statistical based Naïve Bayes, RBF Classifier, and RBF Network machine learning algorithms. , 2019, The Science of the total environment.

[63]  V. Marsala,et al.  Analysis of Soil Erosion Induced by Heavy Rainfall: A Case Study from the NE Abruzzo Hills Area in Central Italy , 2018, Water.

[64]  Susanta Mahato,et al.  Groundwater Potential Mapping in a Rural River Basin by Union (OR) and Intersection (AND) of Four Multi-criteria Decision-Making Models , 2018, Natural Resources Research.

[65]  Saro Lee,et al.  GIS mapping of regional probabilistic groundwater potential in the area of Pohang City, Korea , 2011 .

[66]  Hamid Reza Pourghasemi,et al.  Evaluation of different machine learning models for predicting and mapping the susceptibility of gully erosion , 2017 .

[67]  Michael Maerker,et al.  An integrated assessment of soil erosion dynamics with special emphasis on gully erosion in the Mazayjan basin, southwestern Iran , 2015, Natural Hazards.

[68]  Saro Lee,et al.  Modelling gully-erosion susceptibility in a semi-arid region, Iran: Investigation of applicability of certainty factor and maximum entropy models. , 2019, The Science of the total environment.

[69]  Weldon A. Lodwick,et al.  Attribute error and sensitivity analysis of map operations in geographical informations systems: suitability analysis , 1990, Int. J. Geogr. Inf. Sci..

[70]  Sanat Kumar Guchhait,et al.  Geomorphic Threshold Estimation for Gully Erosion in the Lateritic Soil ofBirbhum, West Bengal, India , 2016 .

[71]  N. Park Using maximum entropy modeling for landslide susceptibility mapping with multiple geoenvironmental data sets , 2015, Environmental Earth Sciences.

[72]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[73]  Jianrong Fan,et al.  The contribution of gully erosion to total sediment production in a small watershed in Southwest China , 2018 .

[74]  K. A. Abdul Maulud,et al.  Dynamics of Sediment Transport and Erosion-Deposition Patterns in the Locality of a Detached Low-Crested Breakwater on a Cohesive Coast , 2019, Water.

[75]  Paraskevas Tsangaratos,et al.  Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size , 2016 .

[76]  E. Rotigliano,et al.  Binary logistic regression versus stochastic gradient boosted decision trees in assessing landslide susceptibility for multiple-occurring landslide events: application to the 2009 storm event in Messina (Sicily, southern Italy) , 2015, Natural Hazards.

[77]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[78]  Nhat-Duc Hoang,et al.  Spatial prediction of rainfall-induced shallow landslides using hybrid integration approach of Least-Squares Support Vector Machines and differential evolution optimization: a case study in Central Vietnam , 2016, Int. J. Digit. Earth.

[79]  E. Rotigliano,et al.  Improving transferability strategies for debris flow susceptibility assessment: Application to the Saponara and Itala catchments (Messina, Italy) , 2017 .

[80]  R. Bingner,et al.  Evaluating ephemeral gully erosion impact on Zea mays L. yield and economics using AnnAGNPS , 2016 .

[81]  M. Conforti,et al.  Geomorphology and GIS analysis for mapping gully erosion susceptibility in the Turbolo stream catchment (Northern Calabria, Italy) , 2011 .

[82]  H. Pourghasemi,et al.  Gully erosion susceptibility mapping: the role of GIS-based bivariate statistical models and their comparison , 2016, Natural Hazards.

[83]  S. Saha,et al.  Integration of artificial intelligence with meta classifiers for the gully erosion susceptibility assessment in Hinglo river basin, Eastern India , 2021, Advances in Space Research.

[84]  M. Maerker,et al.  Prediction of gully erosion susceptibilities using detailed terrain analysis and maximum entropy modeling: A case study in the Mazayejan plain, southwest Iran , 2014 .

[85]  R. Gloaguen,et al.  Optimal parameter selection for qualitative regional erosion risk monitoring: A remote sensing study of SE Ethiopia , 2011 .

[86]  Maria Ferentinou,et al.  Shallow landslide susceptibility assessment in a semiarid environment — A Quaternary catchment of KwaZulu-Natal, South Africa , 2016 .