Conditioning factor determination for mapping and prediction of landslide susceptibility using machine learning algorithms

Landslides are type of natural geohazard interfering with many economical and social activities and causing serious damages on human life. It is ranked as a great disaster, threatening life, property and environment. Therefore, early prediction of landslide prone areas is vital. Variety of causative factors such as glaciers melting, excessive raining, mining, volcanic activities, active faults, earthquake, logging, erosion, urbanization, construction, and other human activities can trigger landslide occurrence. Then, identification of factors that directly influences the slide events is highly in demand. Some topographical, geological, and hydrological datasets (e.g., slope, aspect, geology, terrain roughness, vegetation index, distance to stream, distance to road, distance to fault, land use, precipitation, profile curvature, plan curvature) are considered to be effective conditioning factors. However, the importance of each factor differs from one study to another. This study investigates the effectiveness of four sets of landslide conditioning variable(s). Fourteen landslide conditioning variables were considered in this study where they were duly divided into four groups G1, G2, G3, and G4. Three machine learning algorithms namely, Random Forest (RF), Naive Bayes (NB), and Boosted Logistic Regression (LogitBoost) were constructed based on each dataset in order to determine which set would be more suitable for landslide susceptibility prediction. In total, 227 landslide inventory datasets of the study area were used where 70% was used for training and 30% for testing. To this end, in the present research, the two main objectives were: 1) Investigation on effectiveness of 14 landslides conditioning factors (altitude, slope, aspect, total curvature, profile curvature, plan curvature, Stream Power Index (SPI), Topographic Wetness Index (TWI), Terrain Roughness Index (TRI), distance to fault, distance to road, distance to stream, land use, and geology) by analyzing and determining the most important factors using variance-inflated factor (VIF), Pearson’s correlation and Chi-square techniques. Consequently, 4 categories of datasets were defined; first dataset included all 14 conditioning factors, second dataset included Digital Elevation Models (DEM) derivatives (morphometrice factors), third dataset was only based on 5 factors namely lithology, land use, distance to stream, distance to road, and distance to fault, and last dataset was included 8 factors selected using factor analysis and optimization. 2) Evaluate the sensitivity of each modeling technique (NB, RF and LogitBoost) to different conditioning factors using the area under curve (AUC). Eventually, RF technique using optimized variables (G4) performed well with AUC of 0.940 followed by LogitBoost (0.898) and NB (0.864).

[1]  Biswajeet Pradhan,et al.  A comparative study of different machine learning methods for landslide susceptibility assessment: A case study of Uttarakhand area (India) , 2016, Environ. Model. Softw..

[2]  Bahareh Kalantar,et al.  Performance Evaluation and Sensitivity Analysis of Expert-Based, Statistical, Machine Learning, and Hybrid Models for Producing Landslide Susceptibility Maps , 2017 .

[3]  P. Reichenbach,et al.  A review of statistically-based landslide susceptibility models , 2018 .

[4]  Baihua Xiao,et al.  Cross-Domain Ground-Based Cloud Classification Based on Transfer of Local Features and Discriminative Metric Learning , 2017, Remote. Sens..

[5]  Inge Revhaug,et al.  Optimization of Causative Factors for Landslide Susceptibility Evaluation Using Remote Sensing and GIS Data in Parts of Niigata, Japan , 2015, PloS one.

[6]  Biswajeet Pradhan,et al.  Novel GIS Based Machine Learning Algorithms for Shallow Landslide Susceptibility Mapping , 2018, Sensors.

[7]  A. Zhu,et al.  GIS-based landslide susceptibility evaluation using a novel hybrid integration approach of bivariate statistical based random forest method , 2018 .

[8]  T. Durrani,et al.  Geological Disaster Monitoring Based on Sensor Networks , 2019, Springer Natural Hazards.

[9]  Eibe Frank,et al.  Logistic Model Trees , 2003, ECML.

[10]  Naonori Ueda,et al.  Landslide susceptibility mapping at Dodangeh watershed, Iran using LR and ANN models in GIS , 2018, Remote Sensing.

[11]  Yu Huang,et al.  Review on landslide susceptibility mapping using support vector machines , 2018, CATENA.

[12]  Biswajeet Pradhan,et al.  Analysis and evaluation of landslide susceptibility: a review on articles published during 2005–2016 (periods of 2005–2012 and 2013–2016) , 2018, Arabian Journal of Geosciences.

[13]  Jonathan M. Garibaldi,et al.  A 'non-parametric' version of the naive Bayes classifier , 2011, Knowl. Based Syst..

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Alfian Abdul Halin,et al.  Conditioning Factors Determination for Landslide Susceptibility Mapping Using Support Vector Machine Learning , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[16]  Paraskevas Tsangaratos,et al.  Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size , 2016 .

[17]  Wei Chen,et al.  Land Subsidence Susceptibility Mapping in South Korea Using Machine Learning Algorithms , 2018, Sensors.

[18]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[19]  Zhong Lu,et al.  Remote Sensing of Landslides - A Review , 2018, Remote. Sens..

[20]  Nguyen Quoc Thanh,et al.  Spatial prediction of rainfall-induced landslides for the Lao Cai area (Vietnam) using a hybrid intelligent approach of least squares support vector machines inference model and artificial bee colony optimization , 2017, Landslides.

[21]  Hui Lin,et al.  A Modified Change Vector Approach for Quantifying Land Cover Change , 2018, Remote. Sens..

[22]  Hamid Reza Pourghasemi,et al.  Landslide susceptibility modeling applying machine learning methods: A case study from Longju in the Three Gorges Reservoir area, China , 2018, Comput. Geosci..

[23]  R. O’Brien,et al.  A Caution Regarding Rules of Thumb for Variance Inflation Factors , 2007 .

[24]  T. Kavzoglu,et al.  Landslide susceptibility mapping using GIS-based multi-criteria decision analysis, support vector machines, and logistic regression , 2014, Landslides.

[25]  Cardona Alzate,et al.  Predicción y selección de variables con bosques aleatorios en presencia de variables correlacionadas , 2020 .

[26]  H. Pourghasemi,et al.  Prediction of the landslide susceptibility: Which algorithm, which precision? , 2018 .

[27]  Biswajeet Pradhan,et al.  Assessment of Landslide Susceptibility Using Statistical- and Artificial Intelligence-Based FR-RF Integrated Model and Multiresolution DEMs , 2019, Remote. Sens..

[28]  Rubini Mahalingam,et al.  Evaluation of landslide susceptibility mapping techniques using lidar-derived conditioning factors (Oregon case study) , 2016 .

[29]  Tao Chen,et al.  Object-Oriented Landslide Mapping Using ZY-3 Satellite Imagery, Random Forest and Mathematical Morphology, for the Three-Gorges Reservoir, China , 2017, Remote. Sens..

[30]  Hoang Nguyen,et al.  Potential of hybrid evolutionary approaches for assessment of geo-hazard landslide susceptibility mapping , 2019, Geomatics, Natural Hazards and Risk.

[31]  Shallow Landslide Susceptibility Mapping for Zagreb Hilly Area, Croatia , 2014 .

[32]  Majid Shadman Roodposhti,et al.  Fuzzy Shannon Entropy: A Hybrid GIS-Based Landslide Susceptibility Mapping Method , 2016, Entropy.

[33]  N. Ueda,et al.  AN EVALUATION OF LANDSLIDE SUSCEPTIBILITY MAPPING USING REMOTE SENSING DATA AND MACHINE LEARNING ALGORITHMS IN IRAN , 2019, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[34]  Saro Lee,et al.  Application of Ensemble-Based Machine Learning Models to Landslide Susceptibility Mapping , 2018, Remote. Sens..

[35]  Arko Lucieer,et al.  Object-based random forest classification of Landsat ETM+ and WorldView-2 satellite imagery for mapping lowland native grassland communities in Tasmania, Australia , 2018, Int. J. Appl. Earth Obs. Geoinformation.

[36]  Jens Forster,et al.  Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small Challenge , 2009, KDD Cup.

[37]  Wei Chen,et al.  Improving the accuracy of landslide susceptibility model using a novel region-partitioning approach , 2018, Landslides.

[38]  Bo Du,et al.  Spectral–Spatial Unified Networks for Hyperspectral Image Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[39]  B. Pradhan,et al.  A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility , 2017 .

[40]  Bahareh Kalantar,et al.  Groundwater potential mapping using C5.0, random forest, and multivariate adaptive regression spline models in GIS , 2018, Environmental Monitoring and Assessment.

[41]  Shengwu Qin,et al.  The Influence of Different Knowledge-Driven Methods on Landslide Susceptibility Mapping: A Case Study in the Changbai Mountain Area, Northeast China , 2019, Entropy.

[42]  Roland Ngwatung Afungang,et al.  Assessing the spatial probability of landslides using GIS and informative value model in the Bamenda highlands , 2017, Arabian Journal of Geosciences.