A high resolution map of soil types and physical properties for Cyprus: A digital soil mapping optimization

Fine-resolution soil maps constitute important data for many different environmental studies. Digital soil mapping techniques represent a cost-effective method to obtain detailed information about soil types and soil properties over large areas. The main objective of the study was to extend predictions from 1:25,000 legacy soil surveys (including WRB soil groups, soil depth and soil texture classes) to the larger area of Cyprus. A multiple-trees classification technique, namely Random Forest (RF), was applied. Specific objectives were: (i) to analyze the role and importance of a large data set of environmental predictors, (ii) to investigate the effect of the number of training points, forest size (ntree), the numbers of predictors sampled per node (mtry) and tree size (nodesize) in RF; (iii) to compare RF-derived maps with maps derived with a multinomial logistic regression model, in terms of validation error (test set and independent profiles) and map uncertainty, using the confusion index and a newly developed reliability index. The optimized RF model was run using half of the input points available (over a million) and with ntree equal to 350. The mtry parameter was set to 5 (close to half the number of the environmental variables used) for both soil series and soil properties. The nodesize calibration showed no relevant performance increase and was kept at its default value (1). In terms of environmental variables, the model used 10 predictors, covering all the soil formation factors considered in the scorpan formula, to derive the three maps. Soil properties, derived from geochemistry data, showed a high importance in deriving soil groups, depths and texture. Random Forest constructed a better predictive model than multinomial logistic regression, showing comparable predictive uncertainty but much lower validation error. The RF-derived maps show very low out of bag (OOB) errors (around 10% for both soil groups and soil properties) but relatively high validation error from independent profiles (45% for soil depth, 51% for soil texture). The resulting reliability index was low in the main mountainous area of Cyprus, where predictions were extrapolations as indicated by the multivariate environmental similarity surface, but medium to high in the main agricultural areas of the country.

[1]  Dominique Arrouays,et al.  Extrapolating regional soil landscapes from an existing soil map: Sampling intensity, validation procedures, and integration of spatial context , 2008 .

[2]  P. A. Shary,et al.  Models of Topography , 2008 .

[3]  A. Zissimos,et al.  Anthropogenic versus lithological influences on soil geochemical patterns in Cyprus , 2012 .

[4]  A. Zissimos,et al.  Geochemical patterns in the soils of Cyprus. , 2012, The Science of the total environment.

[5]  Misganu Debella-Gilo,et al.  Spatial prediction of soil classes using digital terrain analysis and multinomial logistic regression modeling integrated in GIS: Examples from Vestfold County, Norway , 2009 .

[6]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[7]  Alfred E. Hartemink,et al.  Total soil organic carbon and carbon sequestration potential in Nigeria , 2016 .

[8]  Waldir de Carvalho Junior,et al.  Spatial prediction of soil surface texture in a semiarid region using random forest and multiple linear regressions , 2016 .

[9]  Russell Lawley,et al.  Digital Soil Mapping at a National Scale: A Knowledge and GIS Based Approach to Improving Parent Material and Property Information , 2008 .

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Jongsung Kim,et al.  Holistic environmental soil-landscape modeling of soil organic carbon , 2014, Environ. Model. Softw..

[12]  Gerard B. M. Heuvelink,et al.  Efficiency comparison of conventional and digital soil mapping for updating soil maps , 2012 .

[13]  Paul V. Bolstad,et al.  Positional uncertainty in manually digitized map data , 1990, Int. J. Geogr. Inf. Sci..

[14]  Peter Scull,et al.  Predictive soil mapping: a review , 2003 .

[15]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[16]  A. Zissimos,et al.  Reflections of the geological characteristics of Cyprus in soil rare earth element patterns , 2015 .

[17]  Christopher S. Galletti,et al.  Long-term agrarian landscapes in the Troodos foothills, Cyprus , 2012 .

[18]  Marine Lacoste,et al.  Extrapolation at regional scale of local soil knowledge using boosted classification trees: A two-step approach , 2012 .

[19]  Lutz Breuer,et al.  Land use and climate control the spatial distribution of soil types in the grasslands of Inner Mongolia , 2013 .

[20]  Vinay Kumar Dadhwal,et al.  Digital mapping of soil organic and inorganic carbon status in India , 2016 .

[21]  A. Stum,et al.  Random Forests Applied as a Soil Spatial Predictive Model in Arid Utah , 2010 .

[22]  Laura Poggio,et al.  Regional scale mapping of soil properties and their uncertainty with a large number of satellite-derived covariates , 2013 .

[23]  B. Kamber,et al.  Quantifying chemical weathering intensity and trace element release from two contrasting basalt profiles, Deccan Traps, India , 2014 .

[24]  Vince Láng,et al.  Deriving World Reference Base Reference Soil Groups from the prospective Global Soil Map product — A case study on major soil types of Africa , 2016 .

[25]  Budiman Minasny,et al.  Digital soil mapping: A brief history and some lessons , 2016 .

[26]  Jin Zhang,et al.  An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping , 2016 .

[27]  B. De Baets,et al.  Wetland vegetation distribution modelling for the identification of constraining environmental variables , 2008, Landscape Ecology.

[28]  P.F.M. van Gaans,et al.  Continuous classification in soil survey: spatial correlation, confusion and boundaries , 1997 .

[29]  Achim Zeileis,et al.  Conditional variable importance for random forests , 2008, BMC Bioinformatics.

[30]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[31]  Y. K. Soon,et al.  A comparison of some methods for soil organic carbon determination , 1991 .

[32]  D. N. Bird,et al.  Modelling climate change impacts on and adaptation strategies for agriculture in Sardinia and Tunisia using AquaCrop and value-at-risk. , 2016, The Science of the total environment.

[33]  P. Scull,et al.  The application of classification tree analysis to soil type prediction in a desert landscape , 2005 .

[34]  Adriana Bruggeman,et al.  Combining Qualitative and Quantitative Methods for Soil Erosion Assessments: An Application in a Sloping Mediterranean Watershed, Cyprus , 2017 .

[35]  Steven J. Phillips,et al.  The art of modelling range‐shifting species , 2010 .

[36]  Patrick Bogaert,et al.  Updating soil survey maps using random forest and conditioned Latin hypercube sampling in the loess derived soils of northern Iran , 2014 .

[37]  J. M. Bremner,et al.  A rapid and precise method for routine determination of organic carbon in soil , 1988 .

[38]  W. Rawls,et al.  Soil Water Characteristic Estimates by Texture and Organic Matter for Hydrologic Solutions , 2006 .

[39]  Panos Panagos,et al.  Soil organic carbon content indicators and web mapping applications , 2008, Environ. Model. Softw..

[40]  Philippe Lagacherie,et al.  A soil survey procedure using the knowledge of soil pattern established on a previously mapped reference area , 1995 .

[41]  P. Burrough Principles of Geographical Information Systems for Land Resources Assessment , 1986 .

[42]  Gerard B. M. Heuvelink,et al.  Refining a reconnaissance soil map by calibrating regression models with data from the same map (Normandy, France) , 2014 .

[43]  B. Minasny,et al.  Comparing data mining classifiers to predict spatial distribution of USDA-family soil groups in Baneh region, Iran , 2015 .

[44]  Gerard B. M. Heuvelink,et al.  Updating the 1:50,000 Dutch soil map using legacy soil data: A multinomial logistic regression approach , 2009 .

[45]  Hossein Khademi,et al.  Spatial prediction of USDA‐ great soil groups in the arid Zarand region, Iran: comparing logistic regression approaches to predict diagnostic horizons and soil types , 2012 .

[46]  M. Wiesmeier,et al.  Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem , 2011, Plant and Soil.

[47]  A. Zissimos,et al.  Distribution of water-soluble inorganic ions in the soils of Cyprus , 2014 .

[48]  Budiman Minasny,et al.  On digital soil mapping , 2003 .

[49]  Margaret G. Schmidt,et al.  Predictive soil parent material mapping at a regional-scale: a Random Forest approach. , 2014 .

[50]  Thorsten Behrens,et al.  Instance selection and classification tree analysis for large spatial datasets in digital soil mapping , 2008 .

[51]  A-Xing Zhu,et al.  Multi-scale digital terrain analysis and feature selection for digital soil mapping , 2010 .

[52]  H. Elsenbeer,et al.  Soil organic carbon concentrations and stocks on Barro Colorado Island — Digital soil mapping using Random Forests analysis , 2008 .

[53]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..

[54]  P. Lagacherie,et al.  Combining Vis–NIR hyperspectral imagery and legacy measured soil profiles to map subsurface soil properties in a Mediterranean area (Cap-Bon, Tunisia) , 2013 .