Hybrid ensemble machine learning approaches for landslide susceptibility mapping using different sampling ratios at East Sikkim Himalayan, India

Abstract Landslide is a big problem in the mountainous region all over the world. Sikkim Himalayan region is also suffering from landslide problem. This study's main objective was to generate landslide susceptibility map (LSM) considering the hybrid ensemble of machine learning approaches using different sample ratios. Random Forest (RF) as the base classifier an ensemble with bagging, Rotation Forest (RTF), and Random Subspace (RS) Meta classifiers were used for spatial landslide modeling. First, collected 86 landslides locations through field investigation and from Sikkim district disaster office were mapped as a landslide inventory. Collected landslide locations were categorized into training and testing datasets randomly using four sample ratios (50:50, 60:40, 70:30 and 80:20). Based on the four sampling ratios and fifteen conditioning factors, a total of sixteen LSMs were prepared using RF, Bagging-RF (B-RF), RTF-RF and RS-RF in GIS platform. For assessing the modeling accuracy and comparison among these, the area under the receiver operating characteristics (AUROC) and other statistical methods such as root-mean-square-error (RMSE), mean-absolute-error (MAE) and R-index methods were used. The overall proficiency of RS-RF (AUC = 0.871, 0.847 of 50%:50%, AUC = 0.925, 0.931 of 60%:40%, AUC = 0.933, 0.939 of 70%:30%; AUC = 0.927, 0.933 of 80%:20%) was found to be substantially greater than the results of RF, B-RF, and RTF-RF. The RS-RF model and 70:30 sample ratio had the highest goodness-of-fit and accuracy as per the RMSE, MAE, and R-index methods. Furthermore, the model based on RS-RF is a promising and acceptable way of mapping regional landslides.

[1]  A. Karegowda,et al.  COMPARATIVE STUDY OF ATTRIBUTE SELECTION USING GAIN RATIO AND CORRELATION BASED FEATURE SELECTION , 2010 .

[2]  Biswajeet Pradhan,et al.  Evaluating the Performance of Individual and Novel Ensemble of Machine Learning and Statistical Models for Landslide Susceptibility Assessment at Rudraprayag District of Garhwal Himalaya , 2020, Applied Sciences.

[3]  A. Kawasaki,et al.  Landslide susceptibility mapping of the Sera River Basin using logistic regression model , 2017, Natural Hazards.

[4]  T. Oguchi,et al.  Rainfall intensity–duration conditions for mass movements in Taiwan , 2015, Progress in Earth and Planetary Science.

[5]  Tamer Topal,et al.  GIS-based landslide susceptibility mapping using bivariate statistical analysis in Devrek (Zonguldak-Turkey) , 2012, Environmental Earth Sciences.

[6]  Guangqi Chen,et al.  Evaluation of impact force of rock landslides acting on structures using discontinuous deformation analysis , 2019, Computers and Geotechnics.

[7]  E. Rotigliano,et al.  Improving transferability strategies for debris flow susceptibility assessment: Application to the Saponara and Itala catchments (Messina, Italy) , 2017 .

[8]  H. Pourghasemi,et al.  Assessing the performance of GIS- based machine learning models with different accuracy measures for determining susceptibility to gully erosion. , 2019, The Science of the total environment.

[9]  Ali Jamali Landslide hazard risk modeling in north-west of Iran using optimized machine learning models , 2020, Modeling Earth Systems and Environment.

[10]  M. H. Abokharima,et al.  Land subsidence susceptibility mapping at Kinta Valley (Malaysia) using the evidential belief function model in GIS , 2014, Natural Hazards.

[11]  Iman Nasiri Aghdam,et al.  Landslide susceptibility mapping using an ensemble statistical index (Wi) and adaptive neuro-fuzzy inference system (ANFIS) model at Alborz Mountains (Iran) , 2016, Environmental Earth Sciences.

[12]  B. Pradhan,et al.  GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost ensemble frameworks , 2016, Environmental Earth Sciences.

[13]  Saro Lee,et al.  Enhancing Prediction Performance of Landslide Susceptibility Model Using Hybrid Machine Learning Approach of Bagging Ensemble and Logistic Model Tree , 2018, Applied Sciences.

[14]  Biswajeet Pradhan,et al.  A comparative study of different machine learning methods for landslide susceptibility assessment: A case study of Uttarakhand area (India) , 2016, Environ. Model. Softw..

[15]  Aytug Onan,et al.  Classifier and feature set ensembles for web page classification , 2016, J. Inf. Sci..

[16]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[17]  Mousa Abedini,et al.  Assessing LNRF, FR, and AHP models in landslide susceptibility mapping index: a comparative study of Nojian watershed in Lorestan province, Iran , 2018, Environmental Earth Sciences.

[18]  A. Ozdemir,et al.  A comparative study of frequency ratio, weights of evidence and logistic regression methods for landslide susceptibility mapping: Sultan Mountains, SW Turkey , 2013 .

[19]  A. Zhu,et al.  An expert knowledge-based approach to landslide susceptibility mapping using GIS and fuzzy logic , 2014 .

[20]  L. Ayalew,et al.  The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan , 2005 .

[21]  Debi Prasanna Kanungo,et al.  Landslide hazard zonation : a case study in Garhwal Himalaya,India , 1995 .

[22]  Hyuck-Jin Park,et al.  Spatial clustering and modelling for landslide susceptibility mapping in the north of the Kathmandu Valley, Nepal , 2020, Landslides.

[23]  T. Kavzoglu,et al.  Susceptibility mapping of shallow landslides using kernel-based Gaussian process, support vector machines and logistic regression , 2016 .

[24]  D. P. Shrestha,et al.  The influence of land use and land cover change on landslide susceptibility: a case study in Zhushan Town, Xuan'en County (Hubei, China) , 2019, Natural Hazards and Earth System Sciences.

[25]  Mark R. Segal,et al.  Machine Learning Benchmarks and Random Forest Regression , 2004 .

[26]  John J. Clague,et al.  Shallow Landslide Susceptibility Mapping by Random Forest Base Classifier and Its Ensembles in a Semi-Arid Region of Iran , 2020 .

[27]  Thomas Blaschke,et al.  A Comparative Study of Statistics-Based Landslide Susceptibility Models: A Case Study of the Region Affected by the Gorkha Earthquake in Nepal , 2019, ISPRS Int. J. Geo Inf..

[28]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[29]  Veronica Tofani,et al.  Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues , 2013 .

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Geping Luo,et al.  Landslide Susceptibility Assessment Using Spatial Multi-Criteria Evaluation Model in Rwanda , 2018, International journal of environmental research and public health.

[32]  Kounghoon Nam,et al.  An extreme rainfall-induced landslide susceptibility assessment using autoencoder combined with random forest in Shimane Prefecture, Japan , 2020 .

[33]  Shahab S. Band,et al.  Prediction of landslide susceptibility in Rudraprayag, India using novel ensemble of conditional probability and boosted regression tree-based on cross-validation method. , 2020, The Science of the total environment.

[34]  Biswajeet Pradhan,et al.  A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS , 2013, Comput. Geosci..

[35]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  J. Corominas,et al.  Assessment of shallow landslide susceptibility by means of multivariate statistical techniques , 2001 .

[37]  J. Peters,et al.  Random forests as a tool for ecohydrological distribution modelling , 2007 .

[38]  T. Kavzoglu,et al.  Landslide susceptibility mapping using GIS-based multi-criteria decision analysis, support vector machines, and logistic regression , 2014, Landslides.

[39]  Tusar Kanti Hembram,et al.  Application of phenology-based algorithm and linear regression model for estimating rice cultivated areas and yield using remote sensing data in Bansloi River Basin, Eastern India , 2020, Remote Sensing Applications: Society and Environment.

[40]  A. Trigila,et al.  Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy) , 2015 .

[41]  Haleh Vafaie,et al.  Feature Selection Methods: Genetic Algorithms vs. Greedy-like Search , 2009 .

[42]  A. Lloret,et al.  Fast physically-based model for rainfall-induced landslide susceptibility assessment at regional scale , 2021, CATENA.

[43]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[44]  Bayes Ahmed,et al.  Application of Bivariate and Multivariate Statistical Techniques in Landslide Susceptibility Modeling in Chittagong City Corporation, Bangladesh , 2017, Remote. Sens..

[45]  Jie Dou,et al.  Open image in new windowGIS-Based Landslide Susceptibility Mapping Using a Certainty Factor Model and Its Validation in the Chuetsu Area, Central Japan , 2014 .

[46]  Jie Dou,et al.  Handling high predictor dimensionality in slope-unit-based landslide susceptibility models through LASSO-penalized Generalized Linear Model , 2017, Environ. Model. Softw..

[47]  Biswajeet Pradhan,et al.  Novel GIS Based Machine Learning Algorithms for Shallow Landslide Susceptibility Mapping , 2018, Sensors.

[48]  Iman Nasiri Aghdam,et al.  A new hybrid model using Step-wise Weight Assessment Ratio Analysis (SWARA) technique and Adaptive Neuro-fuzzy Inference System (ANFIS) for regional landslide hazard assessment in Iran , 2015 .

[49]  Paraskevas Tsangaratos,et al.  Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size , 2016 .

[50]  H. Lan,et al.  Optimizing the frequency ratio method for landslide susceptibility assessment: A case study of the Caiyuan Basin in the southeast mountainous area of China , 2020, Journal of Mountain Science.

[51]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[52]  A. Kornejady,et al.  Landslide susceptibility assessment using three bivariate models considering the new topo-hydrological factor: HAND , 2018 .

[53]  Cristiano Ballabio,et al.  Support Vector Machines for Landslide Susceptibility Mapping: The Staffora River Basin Case Study, Italy , 2012, Mathematical Geosciences.

[54]  P. Aleotti,et al.  Landslide hazard assessment: summary review and new perspectives , 1999 .

[55]  D. Rozos,et al.  Case Event System for Landslide Susceptibility Analysis , 2013 .

[56]  Thomas Blaschke,et al.  A Novel Ensemble Approach for Landslide Susceptibility Mapping (LSM) in Darjeeling and Kalimpong Districts, West Bengal, India , 2019, Remote. Sens..

[57]  Javed Mallick,et al.  Risk Assessment of Resources Exposed to Rainfall Induced Landslide with the Development of GIS and RS Based Ensemble Metaheuristic Machine Learning Algorithms , 2021 .

[58]  S. Mandal,et al.  Modeling and mapping landslide susceptibility zones using GIS based multivariate binary logistic regression (LR) model in the Rorachu river basin of eastern Sikkim Himalaya, India , 2018, Modeling Earth Systems and Environment.

[59]  Aykut Akgün,et al.  Mapping erosion susceptibility by a multivariate statistical method: A case study from the Ayvalık region, NW Turkey , 2011, Comput. Geosci..

[60]  Zahra Kalantari,et al.  Soil moisture remote-sensing applications for identification of flood-prone areas along transport infrastructure , 2018, Environmental Earth Sciences.

[61]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..