Assessment of spatial hybrid methods for predicting soil organic matter using DEM derivatives and soil parameters

Abstract This paper assesses hybrid spatial models with the use of auxiliary variables based on machine learning algorithms for predicting soil Organic Matter (OM) content in Kastoria area (Greece). The machine learning methods that are used are random forests (RF) and gradient boosting (GB), also called ensemble methods, which combine multiple Classification and Regression Trees (CART). Overall, the different methods evaluated in the current study are Ordinary Kriging (OK), Regression Kriging (RK), Random Forest (RF), Random Forest Kriging (RFK), Gradient Boosting (GB) and Gradient Boosting Kriging (GBK). According to the findings of the study, machine learning methods (RF and GB) improve the prediction accuracy. The improvement ranged from 6% to 9% for RMSE, 47% to 250% for R2 and 4% to 11% for MAE. Moreover, the introduction of residuals' kriging (hybrid methods), increases the accuracy of predictions furthermore (from 1% to 34%). It is also interesting that the measured collocated soil parameters that are used as auxiliary variables have consistently more influence (increased Pearson correlation coefficient for MLR and importance for RF and GB) than the environmental parameters. The main reason could be the flat terrain and the rather homogenous study area that minimizes the effect of topography on the soils. Therefore, topography and spatial characteristics of an area should be considered in design phase, in order to choose the appropriate secondary information in soil parameters' prediction.

[1]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[3]  Alex B. McBratney,et al.  An overview of pedometric techniques for use in soil survey , 2000 .

[4]  A. Kabata-Pendias Trace elements in soils and plants , 1984 .

[5]  Dieu Tien Bui,et al.  A comparative study between popular statistical and machine learning methods for simulating volume of landslides , 2017 .

[6]  Jos Van Orshoven,et al.  Comparing digital soil mapping techniques for organic carbon and clay content: Case study in Burundi's central plateaus , 2017 .

[7]  Dieu Tien Bui,et al.  Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS , 2017 .

[8]  Costas Kosmas,et al.  The effect of land use change on soils and vegetation over various lithological formations on Lesvos (Greece) , 2000 .

[9]  Randall J. Schaetzl,et al.  Soils: Genesis and Geomorphology , 2005 .

[10]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[11]  A. Konopka,et al.  FIELD-SCALE VARIABILITY OF SOIL PROPERTIES IN CENTRAL IOWA SOILS , 1994 .

[12]  Marvin N. Wright,et al.  SoilGrids250m: Global gridded soil information based on machine learning , 2017, PloS one.

[13]  Margaret G. Schmidt,et al.  Predictive soil parent material mapping at a regional-scale: a Random Forest approach. , 2014 .

[14]  Tim Appelhans,et al.  Evaluating machine learning approaches for the interpolation of monthly air temperature at Mt. Kilimanjaro, Tanzania , 2015 .

[15]  B. Huwe,et al.  Uncertainty in the spatial prediction of soil texture: Comparison of regression tree and Random Forest models , 2012 .

[16]  Marine Lacoste,et al.  Extrapolation at regional scale of local soil knowledge using boosted classification trees: A two-step approach , 2012 .

[17]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Zhi-Wei Liu,et al.  Digital mapping of soil organic matter for rubber plantation at regional scale: An application of random forest plus residuals kriging approach , 2015 .

[20]  Jan Seibert,et al.  Topographical influences on soil properties in boreal forests , 2007 .

[21]  Roland Hiederer,et al.  THE MAP OF ORGANIC CARBON IN TOPSOILS IN EUROPE , 2004 .

[22]  Gerard B. M. Heuvelink,et al.  About regression-kriging: From equations to case studies , 2007, Comput. Geosci..

[23]  G. Heuvelink,et al.  Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions , 2015, PloS one.

[24]  H. Pourghasemi,et al.  Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: A case study at Mehran Region, Iran , 2016 .

[25]  Yuanfang Huang,et al.  Spatial prediction of soil organic matter in the presence of different external trends with REML-EBLUP , 2008 .

[26]  Hossein Khademi,et al.  Spatial prediction of soil great groups by boosted regression trees using a limited point dataset in an arid region, southeastern Iran , 2014 .

[27]  G. Heuvelink,et al.  A generic framework for spatial prediction of soil variables based on regression-kriging , 2004 .

[28]  Alex B. McBratney,et al.  Spatial prediction of soil properties from landform attributes derived from a digital elevation model , 1994 .

[29]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[30]  Artur Gil,et al.  Using a stochastic gradient boosting algorithm to analyse the effectiveness of Landsat 8 data for montado land cover mapping: Application in southern Portugal , 2016, Int. J. Appl. Earth Obs. Geoinformation.

[31]  Charles E. Kellogg,et al.  Soil Survey Manual , 2017 .

[32]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[33]  Jin Li,et al.  Application of machine learning methods to spatial interpolation of environmental variables , 2011, Environ. Model. Softw..

[34]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.