Regional mapping of soil parent material by machine learning based on point data

Abstract A machine learning system (MART) has been used to predict soil parent material (SPM) at the regional scale with a 50-m resolution. The use of point-specific soil observations as training data was tested as a replacement for the soil maps introduced in previous studies, with the aim of generating a more even distribution of training data over the study area and reducing information uncertainty. The 27,020-km2 study area (Brittany, northwestern France) contains mainly metamorphic, igneous and sedimentary substrates. However, superficial deposits (aeolian loam, colluvial and alluvial deposits) very often represent the actual SPM and are typically under-represented in existing geological maps. In order to calibrate the predictive model, a total of 4920 point soil descriptions were used as training data along with 17 environmental predictors (terrain attributes derived from a 50-m DEM, as well as emissions of K, Th and U obtained by means of airborne gamma-ray spectrometry, geological variables at the 1:250,000 scale and land use maps obtained by remote sensing). Model predictions were then compared: i) during SPM model creation to point data not used in model calibration (internal validation), ii) to the entire point dataset (point validation), and iii) to existing detailed soil maps (external validation). The internal, point and external validation accuracy rates were 56%, 81% and 54%, respectively. Aeolian loam was one of the three most closely predicted substrates. Poor prediction results were associated with uncommon materials and areas with high geological complexity, i.e. areas where existing maps used for external validation were also imprecise. The resultant predictive map turned out to be more accurate than existing geological maps and moreover indicated surface deposits whose spatial coverage is consistent with actual knowledge of the area. This method proves quite useful in predicting SPM within areas where conventional mapping techniques might be too costly or lengthy or where soil maps are insufficient for use as training data. In addition, this method allows producing repeatable and interpretable results, whose accuracy can be assessed objectively.

[1]  Keith McCloy,et al.  Predictive mapping of soil organic carbon in wet cultivated lands using classification-tree based models: the case study of Denmark. , 2010, Journal of environmental management.

[2]  Neil McKenzie,et al.  Integrating forest soils information across scales: spatial prediction of soil properties under Australian forests. , 2000 .

[3]  Netra R. Regmi,et al.  Modeling susceptibility to landslides using the weight of evidence approach: Western Colorado, USA , 2010 .

[4]  Elisabeth N. Bui,et al.  Spatial data mining for enhanced soil map modelling , 2002, Int. J. Geogr. Inf. Sci..

[5]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[6]  Christian Walter,et al.  Mapping waterlogging of soils using digital terrain models , 1995 .

[7]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[8]  Jean-Michel Poggi,et al.  Boosting and instability for regression trees , 2006, Comput. Stat. Data Anal..

[9]  Dominique Arrouays,et al.  Extrapolating regional soil landscapes from an existing soil map: Sampling intensity, validation procedures, and integration of spatial context , 2008 .

[10]  B. Henderson,et al.  Australia-wide predictions of soil properties using decision trees , 2005 .

[11]  John R. Dymond,et al.  Direct Induction of Compact Rule-Based Classifiers for Resource Mapping , 1994, International Journal of Geographical Information Science.

[12]  Michael Märker,et al.  Reconstructing the Roman topography and environmental features of the Sarno River Plain (Italy) before the AD 79 eruption of Somma–Vesuvius , 2010 .

[13]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[14]  Chris Moran,et al.  A strategy to fill gaps in soil survey over large spatial extents: an example from the Murray-Darling basin of Australia , 2003 .

[15]  Dominique King,et al.  Predicting soil classes with parameters derived from relief and geologic materials in a sandstone region of the Vosges mountains (Northeastern France) , 1999 .

[16]  R. Michel,et al.  [Everything (or almost everything) about the Kappa coefficient]. , 2002, Medecine tropicale : revue du Corps de sante colonial.

[17]  P. Jungerius Soils and Geomorphology , 1985 .

[18]  Enrique R. Vivoni,et al.  The implications of geology, soils, and vegetation on landscape morphology: Inferences from semi-arid basins with complex vegetation patterns in Central New Mexico, USA , 2010 .

[19]  Hangsheng Lin,et al.  Assessment of soil spatial variability at multiple scales , 2004 .

[20]  Budiman Minasny,et al.  On digital soil mapping , 2003 .

[21]  Karin Viergever,et al.  Knowledge discovery from models of soil properties developed through data mining , 2006 .

[22]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[23]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[24]  Chang-Jo Chung,et al.  Combining spatial data in landslide reactivation susceptibility mapping: A likelihood ratio-based approach in W Belgium , 2010 .

[25]  Dagmar Haase,et al.  Loess in Europe—its spatial distribution based on a European Loess Map, scale 1:2,500,000 , 2007 .

[26]  Sabine Grunwald,et al.  Regional modelling of soil carbon at multiple depths within a subtropical watershed. , 2010 .

[27]  Alex B. McBratney,et al.  Soil pattern recognition with fuzzy-c-means : application to classification and soil-landform interrelationships , 1992 .

[28]  A. Zhu Mapping soil landscape as spatial continua: The Neural Network Approach , 2000 .

[29]  Philippe Lagacherie,et al.  Addressing Geographical Data Errors in a Classification Tree for Soil Unit Prediction , 1997, Int. J. Geogr. Inf. Sci..

[30]  Thorsten Behrens,et al.  Instance selection and classification tree analysis for large spatial datasets in digital soil mapping , 2008 .

[31]  B. Fu,et al.  Modeling soil erosion and its response to land-use change in hilly catchments of the Chinese Loess Plateau. , 2010 .

[32]  Neil McKenzie,et al.  A quantitative Australian approach to medium and small scale surveys based on soil stratigraphy and environmental correlation , 1993 .

[33]  Manfred Frechen,et al.  Loess in Europe: mass accumulation rates during the Last Glacial Period , 2003 .

[34]  D. Muchoney,et al.  Regional vegetation mapping and direct land surface parameterization from remotely sensed and site data , 2002 .

[35]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[36]  Jean Poesen,et al.  Factors controlling the spatial distribution of soil piping erosion on loess-derived soils: A case study from central Belgium , 2010 .

[37]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[38]  A. N. Strahler Hypsometric (area-altitude) analysis of erosional topography. , 1952 .

[39]  Russell G. Congalton,et al.  Assessing the accuracy of remotely sensed data : principles and practices , 1998 .

[40]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[41]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[42]  N. McKenzie,et al.  Spatial prediction of soil properties using environmental correlation , 1999 .

[43]  Marine Lacoste,et al.  Extrapolation at regional scale of local soil knowledge using boosted classification trees: A two-step approach , 2012 .

[44]  R. Congalton,et al.  Accuracy assessment: a user's perspective , 1986 .

[45]  Dominique Arrouays,et al.  Optimizing pedotransfer functions for estimating soil bulk density using boosted regression trees. , 2009 .

[46]  Cristiano Ballabio,et al.  Spatial prediction of soil properties in temperate mountain regions using support vector regression , 2009 .

[47]  J. Beek,et al.  Developments in Soil Science , 2019, Global Change and Forest Soils.

[48]  H. Jenny,et al.  Factors of Soil Formation , 1941 .

[49]  James C. Bell,et al.  Calibration and Validation of a Soil-Landscape Model for Predicting Soil Drainage Class , 1992 .

[50]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[51]  J. Friedman Stochastic gradient boosting , 2002 .

[52]  S. Rughooputh,et al.  Mapping of monthly soil erosion risk of mainland Mauritius and its aggregation with delineated basins , 2010 .

[53]  Loredana Antronico,et al.  Soil erosion risk scenarios in the Mediterranean environment using RUSLE and GIS: An application model for Calabria (southern Italy) , 2009 .

[54]  Rick L. Lawrence,et al.  Classification of remotely sensed imagery using stochastic gradient boosting as a refinement of classification tree analysis , 2004 .

[55]  Elisabeth N. Bui,et al.  Extracting soil-landscape rules from previous soil surveys , 1999 .

[56]  Tim Burt,et al.  Testing a climato-topographic index for predicting wetlands distribution along an European climate gradient , 2003 .

[57]  Jerome H Friedman,et al.  Multiple additive regression trees with application in epidemiology , 2003, Statistics in medicine.

[58]  J. Deckers,et al.  World Reference Base for Soil Resources , 1998 .

[59]  K. Beven,et al.  A physically based, variable contributing area model of basin hydrology , 1979 .

[60]  J. Wilford,et al.  Application of airborne gamma-ray spectrometry in soil/regolith mapping and applied geomorphology , 1997 .

[61]  Philippe Lagacherie,et al.  A soil survey procedure using the knowledge of soil pattern established on a previously mapped reference area , 1995 .