Digital mapping of soil classes using spatial extrapolation with imbalanced data

Abstract Digital mapping of soil classes using the extrapolation approach is timesaving, economically cheap, and helps collecting soil data from areas with difficult sampling. However, it has not been explored widely enough for digital mapping of soil classes. This study seeks to evaluate and compare several machine learning and regression algorithms for the extrapolation of soil sub-groups. Also the issue of imbalanced number of observations was addressed and oversampling technique was applied on the minority soil class to improve the models performance. The study area is located in central north Iran with 84 and 72 soil profiles sampled in the donor and recipient areas, respectively. A set of various environmental covariates including remotely sensed data, digital elevation model derivatives and geomorphology map were used as explanatory variables for predicting soil classes. Results showed that among eleven investigated models, C5.0 decision tree (DT), random forest (RF) and multinomial logistic regression (MNL) had the highest overall accuracy of 46%, 42% and 38%, respectively, for the extrapolation of soil classes. Also the Kappa statistic values for these models were 0.30, 0.24 and 0.22, respectively. Oversampling of the minority soil class led to an increase in overall accuracy for some of the models with the highest ones being DT = 53% and RF = 50%. Also, the Kappa value for DT and RF models increased to 0.39 and 0.35, respectively. In addition, oversampling of the minority soil class led to the prevention of losing this class in the final map.

[1]  B. Minasny,et al.  Comparing regression-based digital soil mapping and multiple-point geostatistics for the spatial extrapolation of soil data , 2016 .

[2]  Marine Lacoste,et al.  Extrapolation at regional scale of local soil knowledge using boosted classification trees: A two-step approach , 2012 .

[3]  Dominique Arrouays,et al.  Extrapolating regional soil landscapes from an existing soil map: Sampling intensity, validation procedures, and integration of spatial context , 2008 .

[4]  Maria-Pia Victoria-Feser,et al.  Robust VIF regression with application to variable selection in large data sets , 2013, 1304.5349.

[5]  Budiman Minasny,et al.  Addressing the issue of digital mapping of soil classes with imbalanced class observations , 2019, Geoderma.

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  Mourad Lounis,et al.  Remote Sensing Techniques for Salt Affected Soil Mapping: Application to the Oran Region of Algeria , 2012 .

[8]  Thomas C. Edwards,et al.  Machine learning for predicting soil classes in three semi-arid landscapes , 2015 .

[9]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[10]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[11]  John P. Wilson,et al.  Terrain analysis : principles and applications , 2000 .

[12]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[13]  K. N. Hjerdt,et al.  A new topographic index to quantify downslope controls on local drainage , 2004 .

[14]  P. Scull,et al.  The application of classification tree analysis to soil type prediction in a desert landscape , 2005 .

[15]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[16]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[17]  C. A. Bower,et al.  EXCHANGEABLE CATION ANALYSIS OF SALINE AND ALKALI SOILS , 1952 .

[18]  Russell G. Congalton,et al.  A review of assessing the accuracy of classifications of remotely sensed data , 1991 .

[19]  Budiman Minasny,et al.  Homosoil, a Methodology for Quantitative Extrapolation of Soil Information Across the Globe , 2010 .

[20]  Chris Moran,et al.  Disaggregation of polygons of surficial geology and soil maps using spatial modelling and legacy data , 2001 .

[21]  A. Walkley,et al.  AN EXAMINATION OF THE DEGTJAREFF METHOD FOR DETERMINING SOIL ORGANIC MATTER, AND A PROPOSED MODIFICATION OF THE CHROMIC ACID TITRATION METHOD , 1934 .

[22]  Douglas Rodrigo Kaiser,et al.  Mapeamento digital do solo e suas implicações na extrapolação das relações solo-paisagem em escala de detalhe , 2017 .

[23]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[24]  Jin Zhang,et al.  An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping , 2016 .

[25]  Bart Baesens,et al.  An empirical comparison of techniques for the class imbalance problem in churn prediction , 2017, Inf. Sci..

[26]  H. Jenny Factors of Soil Formation: A System of Quantitative Pedology , 2011 .

[27]  Sattar Hashemi,et al.  To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques , 2016, IEEE Transactions on Knowledge and Data Engineering.

[28]  Donald L. Suarez,et al.  Carbonate and Gypsum , 2018, SSSA Book Series.

[29]  Budiman Minasny,et al.  On digital soil mapping , 2003 .

[30]  J. Legros,et al.  Mapping of the soil , 2005 .

[31]  Budiman Minasny,et al.  Mapping imbalanced soil classes using Markov chain random fields models treated with data resampling technique , 2019, Comput. Electron. Agric..

[32]  S. Ayoubi,et al.  The extrapolation of soil great groups using multinomial logistic regression at regional scale in arid regions of Iran , 2018 .

[33]  Dean P. Foster,et al.  VIF Regression: A Fast Regression Algorithm for Large Data , 2011 .

[34]  J. Gallant,et al.  A multiresolution index of valley bottom flatness for mapping depositional areas , 2003 .

[35]  K. Beven,et al.  A physically based, variable contributing area model of basin hydrology , 1979 .

[36]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[37]  Shiqiang Zhang,et al.  A Comparison of Terrain Indices toward Their Ability in Assisting Surface Water Mapping from Sentinel-1 Data , 2017, ISPRS Int. J. Geo Inf..