Landslide susceptibility analyses using Random Forest, C4.5, and C5.0 with balanced and unbalanced datasets

Abstract The effects of landslides have been exponentially increasing due to the rapid growth of urbanization and global climate change. The information gained from predictive models and landslide susceptibility analyses can be used to develop warning systems and mitigation measures. A comparative study was conducted to evaluate the effectiveness of landslide susceptibility analyses in a given area using three decision tree algorithms including Random Forest (RF), C4.5, and C5.0. Two sets of imagery datasets (raster and vector) were used and three combinations of 13 conditioning factors (including seven geotechnical properties of the soil) were determined by Information Gain, Gain Ratio, Chi-Squared Test, and Random Forest Importance. Datasets for the landslide conditioning factors were created based on the outcomes from the feature selection methods, in three different scenarios. In Scenario 1 the least important factors/features (as identified by information gain, chi-square, and gain ratio measures) were eliminated. In Scenario 2 only the most important factors (as identified by RF feature selection method evaluation) were kept. In Scenario 3, no factor was eliminated, using the data directly obtained from the sources without applying any feature selection method. The performances of the models were evaluated using statistical verification scores. C4.5 was found to have the highest performance when all 13 conditioning parameters (Scenario 3) were used for both the raster and vector data set. The RF model was the least effective in predicting the landslides in all three scenarios. However, the use of the balance vector dataset significantly increased the performance of the RF model. C4.5 and C5.0 had significantly better performance in handling extremely unbalance data in comparison to RF. Density, silt and clay content, and Atterberg’s limits (LL and PI) were the most important geotechnical conditioning factors in the performed landslide susceptibility analyses.

[1]  Umi Kalthum Ngah,et al.  Modeling and Testing Landslide Hazard Using Decision Tree , 2014, J. Appl. Math..

[2]  H. Hong,et al.  The influence of DEM spatial resolution on landslide susceptibility mapping in the Baxie River basin, NW China , 2020, Natural Hazards.

[3]  M. Crozier Landslides: Causes, Consequences and Environment , 1986 .

[4]  Max Kuhn,et al.  Classification Trees and Rule-Based Models , 2013 .

[5]  P. D’Odorico,et al.  A probabilistic model of rainfall‐triggered shallow landslides in hollows: A long‐term analysis , 2003 .

[6]  William J. Burns,et al.  Landslides across the USA: occurrence, susceptibility, and data limitations , 2020, Landslides.

[7]  M. Kaminski The Impact of Quality of Digital Elevation Models on the Result of Landslide Susceptibility Modeling Using the Method of Weights of Evidence , 2020, Geosciences.

[8]  Shiju Sathyadevan,et al.  Comparative Analysis of Decision Tree Algorithms: ID3, C4.5 and Random Forest , 2015, CI 2015.

[9]  Hamid Reza Pourghasemi,et al.  Erratum to: Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia , 2016, Landslides.

[10]  B. Pradhan,et al.  Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines , 2015 .

[11]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Nadhir Al-Ansari,et al.  Landslide Susceptibility Mapping Using Machine Learning Algorithms and Remote Sensing Data in a Tropical Environment , 2020, International journal of environmental research and public health.

[14]  F. Guzzetti,et al.  Landslide inventory maps: New tools for an old problem , 2012 .

[15]  Braja M. Das,et al.  Advanced Soil Mechanics , 2019 .

[16]  D. Bui,et al.  Uncertainties of prediction accuracy in shallow landslide modeling: Sample size and raster resolution , 2019, CATENA.

[17]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[18]  Sandeep Kumar,et al.  PMT: New analytical framework for automated evaluation of geo-environmental modelling approaches. , 2019, The Science of the total environment.

[19]  P. Reichenbach,et al.  Optimal landslide susceptibility zonation based on multiple forecasts , 2010 .

[20]  Daniela Boldini,et al.  Influence of infiltration on the periodic re-activation of slow movements in an overconsolidated clay slope , 2013 .

[21]  Young-Kwang Yeon,et al.  Landslide susceptibility mapping in Injae, Korea, using a decision tree , 2010 .

[22]  Dieu Tien Bui,et al.  Landslide Susceptibility Assessment Using Bagging Ensemble Based Alternating Decision Trees, Logistic Regression and J48 Decision Trees Methods: A Comparative Study , 2017, Geotechnical and Geological Engineering.

[23]  Renzo Rosso,et al.  A physically based model for the hydrologic control on shallow landsliding , 2006 .

[24]  Nguyen-Thanh Son,et al.  Random Forests for Landslide Prediction in Tsengwen River Watershed, Central Taiwan , 2021, Remote. Sens..

[25]  Yanli Wu,et al.  Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping , 2020 .

[26]  A. Zhu,et al.  GIS-based landslide susceptibility evaluation using a novel hybrid integration approach of bivariate statistical based random forest method , 2018 .

[27]  P. Aleotti,et al.  Landslide hazard assessment: summary review and new perspectives , 1999 .

[28]  P. Reichenbach,et al.  Estimating the quality of landslide susceptibility models , 2006 .

[29]  Guang-qi Chen,et al.  Exploring the Impact of Multitemporal DEM Data on the Susceptibility Mapping of Landslides , 2020, Applied Sciences.

[30]  Alban Kuriqi,et al.  Geotechnical Analysis of Hill’s Slopes Areas in Heritage Town of Berati, Albania , 2016 .

[31]  Oddvar Kjekstad,et al.  Economic and Social Impacts of Landslides , 2009 .

[32]  J.M.V. Grzybowski,et al.  Convolutional neural networks applied to semantic segmentation of landslide scars , 2021, CATENA.

[33]  B. Pradhan,et al.  Landslide Susceptibility Assessment in Vietnam Using Support Vector Machines, Decision Tree, and Naïve Bayes Models , 2012 .

[34]  Xinli Hu,et al.  Establishment of a deformation forecasting model for a step-like landslide based on decision tree C5.0 and two-step cluster algorithms: a case study in the Three Gorges Reservoir area, China , 2017, Landslides.

[35]  D. Varnes Landslide hazard zonation: A review of principles and practice , 1984 .

[36]  Brigitte Maier,et al.  Gis Fundamentals A First Textbook On Geographic Information Systems , 2016 .

[37]  R. Schlögel,et al.  Optimizing landslide susceptibility zonation: Effects of DEM spatial resolution and slope unit delineation on logistic regression models , 2018 .

[38]  I. Ilia,et al.  Landslide susceptibility mapping using a modified decision tree classifier in the Xanthi Perfection, Greece , 2016, Landslides.

[39]  Ali P. Yunus,et al.  Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance , 2020 .

[40]  Jianbing Peng,et al.  Distribution and characteristics of landslide in Loess Plateau: A case study in Shaanxi province , 2017 .

[41]  J. M. Duncan,et al.  Soil Strength and Slope Stability , 2005 .

[42]  Hyun-Joo Oh,et al.  Quantitative landslide susceptibility mapping at Pemalang area, Indonesia , 2010 .

[43]  Yong Li,et al.  Mass wasting triggered by the 2008 Wenchuan earthquake is greater than orogenic growth , 2011 .

[44]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[45]  Tony R. Martinez,et al.  Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[46]  Alexander Brenning,et al.  Exploring discrepancies between quantitative validation results and the geomorphic plausibility of statistical landslide susceptibility maps , 2016 .

[47]  J. Godt,et al.  Early warning of rainfall-induced shallow landslides and debris flows in the USA , 2010 .

[48]  P. Martin Mai,et al.  Presenting logistic regression-based landslide susceptibility results , 2018, Engineering Geology.

[49]  H. Oh,et al.  Influence of subsurface flow by Lidar DEMs and physical soil strength considering a simple hydrologic concept for shallow landslide instability mapping , 2019, CATENA.

[50]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[51]  P. Reichenbach,et al.  Landslide hazard assessment in the Collazzone area, Umbria, Central Italy , 2006 .

[52]  T. Kavzoglu,et al.  Susceptibility mapping of shallow landslides using kernel-based Gaussian process, support vector machines and logistic regression , 2016 .

[53]  Zohre Sadat Pourtaghi,et al.  Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia , 2015, Landslides.

[54]  Qian Feng,et al.  A method for landslide susceptibility assessment integrating rough set and decision tree: A case study in Beichuan, China , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[55]  D. Bell,et al.  A review of the geotechnical characteristics of loess and loess-derived soils from Canterbury, South Island, New Zealand , 2017 .

[56]  Sudhakar D Pardeshi,et al.  Landslide hazard assessment: recent trends and techniques , 2013, SpringerPlus.

[57]  Giovanni B. Crosta,et al.  Techniques for evaluating the performance of landslide susceptibility models , 2010 .

[58]  H. Saito,et al.  Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: The Akaishi Mountains, Japan , 2009 .

[59]  Md. Shahinoor Rahman,et al.  Evaluating the Effects of Digital Elevation Models in Landslide Susceptibility Mapping in Rangamati District, Bangladesh , 2020, Remote. Sens..

[60]  A. Yalçın A geotechnical study on the landslides in the Trabzon Province, NE, Turkey , 2011 .

[61]  A. Ozdemir,et al.  A comparative study of frequency ratio, weights of evidence and logistic regression methods for landslide susceptibility mapping: Sultan Mountains, SW Turkey , 2013 .

[62]  Biswajeet Pradhan,et al.  A novel ensemble decision tree-based CHi-squared Automatic Interaction Detection (CHAID) and multivariate logistic regression models in landslide susceptibility mapping , 2014, Landslides.

[63]  M. Popescu A suggested method for reporting landslide causes , 1994 .

[64]  Michael J. Olsen,et al.  Contour Connection Method for automated identification and classification of landslide deposits , 2015, Comput. Geosci..

[65]  Jin Kwan Kim,et al.  Effect of seepage on shallow landslides in consideration of changes in topography: Case study including an experimental sandy slope with artificial rainfall , 2018 .

[66]  B. Pradhan,et al.  A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility , 2017 .

[67]  P. Heidke,et al.  Berechnung Des Erfolges Und Der Güte Der Windstärkevorhersagen Im Sturmwarnungsdienst , 1926 .

[68]  A Comparative Study of Deep Learning and Conventional Neural Network for Evaluating Landslide Susceptibility Using Landslide Initiation Zones , 2020, Understanding and Reducing Landslide Disaster Risk.

[69]  O. Kisi,et al.  Human–Environment Natural Disasters Interconnection in China: A Review , 2020, Climate.

[70]  Kang-Tsung Chang,et al.  An integrated model for predicting rainfall-induced landslides , 2009 .

[71]  Biswajeet Pradhan,et al.  Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree , 2016, Landslides.

[72]  Yuichi Onda,et al.  Effect of topography and soil parameterisation representing soil thicknesses on shallow landslide modelling , 2015 .

[73]  A. Brenning,et al.  The performance of landslide susceptibility models critically depends on the quality of digital elevation models , 2020 .

[74]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[75]  Xianmin Wang,et al.  Spatial Forecast of Landslides in Three Gorges Based On Spatial Data Mining , 2009, Sensors.

[76]  P. Reichenbach,et al.  A review of statistically-based landslide susceptibility models , 2018 .

[77]  Sunil Sharma,et al.  SLOPE STABILITY AND STABILIZATION METHODS , 1996 .

[78]  A. Trigila,et al.  Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy) , 2015 .