Analysis of training sample selection strategies for regression-based quantitative landslide susceptibility mapping methods

All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance. The shortest computation time is achieved by the LR for all sampling strategies.LR-PRS, FLR-PRS and BLR-Whole Data set-ups, yield the best fits with respect to COD values.PRS has a better overall performance when compared to the other adopted sampling strategies.Avoidance of spatial correlation in the data set is critical for the QLSM model's performance.

[1]  Piotr Jankowski,et al.  An optimized solution of multi-criteria evaluation analysis of landslide susceptibility using fuzzy sets and Kalman filter , 2010, Comput. Geosci..

[2]  Işık Yilmaz,et al.  The effect of the sampling strategies on the landslide susceptibility mapping by conditional probability and artificial neural networks , 2010 .

[3]  P. Reichenbach,et al.  Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy , 1999 .

[4]  Lotfi A. Zadeh,et al.  The concept of a linguistic variable and its application to approximate reasoning-III , 1975, Inf. Sci..

[5]  D. Costanzo,et al.  Slope units-based flow susceptibility model: using validation tests to select controlling factors , 2012, Natural Hazards.

[6]  Murat Ercanoglu,et al.  Application of logistic regression and fuzzy operators to landslide susceptibility assessment in Azdavay (Kastamonu, Turkey) , 2011 .

[7]  E. Rotigliano,et al.  Exploring the effect of absence selection on landslide susceptibility models: A case study in Sicily, Italy , 2016 .

[8]  Manoj K. Arora,et al.  A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas , 2006 .

[9]  B. Pradhan Landslide susceptibility mapping of a catchment area using frequency ratio, fuzzy logic and multivariate logistic regression approaches , 2010 .

[10]  Tue Tjur,et al.  Coefficients of Determination in Logistic Regression Models—A New Proposal: The Coefficient of Discrimination , 2009 .

[11]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[12]  Biswajeet Pradhan,et al.  An Assessment of the Use of an Advanced Neural Network Model with Five Different Training Strategies for the Preparation of Landslide Susceptibility Maps , 2021, Journal of Data Science.

[13]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[14]  Biswajeet Pradhan,et al.  An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm , 2012, Comput. Geosci..

[15]  Biswajeet Pradhan,et al.  A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS , 2013, Comput. Geosci..

[16]  Khaled Farag Regression Modeling with Actuarial and Financial Applications . By Edward W. Frees (Cambridge University Press, 2009. 584pp. ISBN: 978-0-521-13596-2) , 2009, Annals of Actuarial Science.

[17]  H. A. Nefeslioglu,et al.  An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for the preparation of landslide susceptibility maps , 2008 .

[18]  P. Reichenbach,et al.  Different landslide sampling strategies in a grid-based bi-variate statistical susceptibility model , 2016 .

[19]  A. Erener,et al.  Landslide susceptibility assessment: what are the effects of mapping unit and mapping method? , 2012, Environmental Earth Sciences.

[20]  Netra R. Regmi,et al.  A comparison of logistic regression-based models of susceptibility to landslides in western Colorado, USA , 2014, Landslides.

[21]  C. F. Lee,et al.  Terrain-based mapping of landslide susceptibility using a geographical information system: a case study , 2001 .

[22]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[23]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[24]  Xu Weiya,et al.  GIS-based landslide hazard assessment: an overview , 2005 .

[25]  Wei-dong Wang,et al.  Landslides susceptibility mapping in Guizhou province based on fuzzy theory , 2009 .

[26]  Lotfi A. Zadeh,et al.  The Concepts of a Linguistic Variable and its Application to Approximate Reasoning , 1975 .

[27]  C. Gokceoğlu,et al.  Use of fuzzy relations to produce landslide susceptibility map of a landslide prone area (West Black Sea Region, Turkey) , 2004 .

[28]  Arzu Erener,et al.  A regional scale quantitative risk assessment for landslides: case of Kumluca watershed in Bartin, Turkey , 2013, Landslides.

[29]  Phil Diamond,et al.  Fuzzy least squares , 1988, Inf. Sci..

[30]  Kevin D. Reilly,et al.  Simulating continuous fuzzy systems , 2005, Inf. Sci..

[31]  Christian Conoscenti,et al.  The role of the diagnostic areas in the assessment of landslide susceptibility models: a test in the sicilian chain , 2011 .