Soil Classification Based on Physical and Chemical Properties Using Random Forests

Soil classification is a method of encoding the most relevant information about a given soil, namely its composition and characteristics, in a single class, to be used in areas like agriculture and forestry. In this paper, we evaluate how confidently we can predict soil classes, following the World Reference Base classification system, based on the physical and chemical characteristics of its layers. The Random Forests classifier was used with data consisting of 6 760 soil profiles composed by 19 464 horizons, collected in Mexico. Four methods of modelling the data were tested (i.e., standard depths, n first layers, thickness, and area weighted thickness). We also fine-tuned the best parameters for the classifier and for a k-NN imputation algorithm, used for addressing problems of missing data. Under-represented classes showed significantly worse results, by being repeatedly predicted as one of the majority classes. The best method to model the data was found to be the n first layers approach, with missing values being imputed with k-NN (\(k=1\)). The results present a Kappa value from 0.36 to 0.48 and were in line with the state of the art methods, which mostly use remote sensing data.

[1]  Russell G. Congalton,et al.  A review of assessing the accuracy of classifications of remotely sensed data , 1991 .

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Nicholas L. Crookston,et al.  yaImpute: An R Package for kNN Imputation , 2008 .

[4]  M. Pringle,et al.  Mapping depth-to-rock from legacy data, using a generalized linear mixed model , 2014 .

[5]  Dominique Arrouays,et al.  GlobalSoilMap : Basis of the global spatial soil information system , 2014 .

[6]  Thomas C. Edwards,et al.  Machine learning for predicting soil classes in three semi-arid landscapes , 2015 .

[7]  J.G.B. Leenaars,et al.  WoSIS: providing standardised soil profile data for the world , 2016 .

[8]  Jin Zhang,et al.  An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping , 2016 .

[9]  Marvin N. Wright,et al.  SoilGrids250m: Global gridded soil information based on machine learning , 2017, PloS one.

[10]  Marvin N. Wright,et al.  Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables , 2018, PeerJ.

[11]  M. Meier,et al.  Digital Soil Mapping Using Machine Learning Algorithms in a Tropical Mountainous Area , 2018, Revista Brasileira de Ciência do Solo.

[12]  E. D. Souza,et al.  Multinomial Logistic Regression and Random Forest Classifiers in Digital Mapping of Soil Classes in Western Haiti , 2018, Revista Brasileira de Ciência do Solo.

[13]  T. Behrens,et al.  Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso) , 2018, Scientific Reports.