Spatial structure, parameter nonlinearity, and intelligent algorithms in constructing pedotransfer functions from large-scale soil legacy data

Pedotransfer function (PTF) approach is a convenient way for estimating difficult-to-measure soil properties from basic soil data. Typically, PTFs are developed using a large number of samples collected from small (regional) areas for training and testing a predictive model. National soil legacy databases offer an opportunity to provide soil data for developing PTFs although legacy data are sparsely distributed covering large areas. Here, we examined the Indian soil legacy (ISL) database to select a comprehensive training dataset for estimating cation exchange capacity (CEC) as a test case in the PTF approach. Geostatistical and correlation analyses showed that legacy data entail diverse spatial and correlation structure needed in building robust PTFs. Through non-linear correlation measures and intelligent predictive algorithms, we developed a methodology to extract an efficient training dataset from the ISL data for estimating CEC with high prediction accuracy. The selected training data had comparable spatial variation and nonlinearity in parameters for training and test datasets. Thus, we identified specific indicators for constructing robust PTFs from legacy data. Our results open a new avenue to use large volume of existing soil legacy data for developing region-specific PTFs without the need for collecting new soil data.

[1]  Inakwu O. A. Odeh,et al.  Enhancing pedotransfer functions with environmental data for estimating bulk density and effective cation exchange capacity in a data‐sparse situation , 2016 .

[2]  D. R. Nielsen,et al.  Spatial and Temporal Statistics - Sampling Field Soils and Their Vegetation , 2003 .

[3]  William H. Hendershot,et al.  Soil Reaction and Exchangeable Acidity , 2007 .

[4]  Budiman Minasny,et al.  From pedotransfer functions to soil inference systems , 2002 .

[5]  C. Ballabio,et al.  LUCAS Soil, the largest expandable soil dataset for Europe: a review , 2018 .

[6]  Nikolaos P. Nikolaidis,et al.  Soil Functions: Connecting Earth's Critical Zone , 2019, Annual Review of Earth and Planetary Sciences.

[7]  Bhabani S. Das,et al.  Assessment of soil texture from spectral reflectance data of bulk soil samples and their dry-sieved aggregate size fractions , 2019, Geoderma.

[8]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[9]  Rattan Lal,et al.  Biochar and Soil Carbon Sequestration , 2015 .

[10]  Gerard B. M. Heuvelink,et al.  Pedotransfer functions to estimate soil water content at field capacity and permanent wilting point in hot Arid Western India , 2018, Journal of Earth System Science.

[11]  Attila Nemes,et al.  Why do they keep rejecting my manuscript — do’s and don’ts and new horizons in pedotransfer studies , 2015 .

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  E. Van Ranst,et al.  Nonparametric Techniques for Predicting Soil Bulk Density of Tropical Rainforest Topsoils in Rwanda , 2012 .

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  M. G. Hodnett,et al.  Marked differences between van Genuchten soil water-retention parameters for temperate and tropical soils: a new water-retention pedo-transfer functions developed for tropical soils , 2002 .

[16]  Satish K. Singh,et al.  Chemical Composition and Charge Behaviour of Smectites in Vertisols of Rajasthan , 2002 .

[17]  M. S. M. Amin,et al.  Pedo-transfer function for saturated hydraulic conductivity of lowland paddy soils , 2009, Paddy and Water Environment.

[18]  Priyabrata Santra,et al.  Pedotransfer functions for soil hydraulic properties developed from a hilly watershed of Eastern India , 2008 .

[19]  Fosco M. Vesely,et al.  A simple pipeline for the assessment of legacy soil datasets: An example and test with soil organic carbon from a highly variable area , 2019, CATENA.

[20]  Niels H. Batjes,et al.  Standardised soil profile data to support global mapping and modelling (WoSIS snapshot 2019) , 2020 .

[21]  Budiman Minasny,et al.  Comparison of different approaches to the development of pedotransfer functions for water-retention curves , 1999 .

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Adrienne Grêt-Regamey,et al.  Soil function assessment: review of methods for quantifying the contributions of soils to ecosystem services. , 2017 .

[24]  Johan Bouma,et al.  The significance of soils and soil science towards realization of the United Nations sustainable development goals , 2016 .

[25]  V Kishore Ayyadevara,et al.  Gradient Boosting Machine , 2018 .

[26]  Marcel G. Schaap,et al.  Description of the unsaturated soil hydraulic database UNSODA version 2.0 , 2001 .

[27]  K. M. Nair,et al.  Soils of India: historical perspective, classification and recent advances , 2013 .

[28]  R. M. Lark,et al.  The relationship between diffuse spectral reflectance of the soil and its cation exchange capacity is scale-dependent , 2010 .

[29]  Budiman Minasny,et al.  Transfer learning to localise a continental soil vis-NIR calibration model , 2019, Geoderma.

[30]  Päivi Eriksson,et al.  A vertic paleosol at the Archean-Proterozoic contact from the Singhbhum-Orissa craton, eastern India , 2010 .

[31]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[32]  Jeffrey R. Chiarenzelli,et al.  Sedimentation history of the Paleoproterozoic Singhbhum Group of rocks, eastern India and its implications , 2016 .

[33]  Wim Cornelis,et al.  A pseudo-continuous neural network approach for developing water retention pedotransfer functions with limited data , 2012 .

[34]  Budiman Minasny,et al.  Monitoring and Modeling Soil Change: The Influence of Human Activity and Climatic Shifts on Aspects of Soil Spatiotemporally , 2016 .

[35]  Vishal Kumar,et al.  Genetic algorithm based support vector machine for on-line voltage stability monitoring , 2015 .

[36]  A. Walkley,et al.  AN EXAMINATION OF THE DEGTJAREFF METHOD FOR DETERMINING SOIL ORGANIC MATTER, AND A PROPOSED MODIFICATION OF THE CHROMIC ACID TITRATION METHOD , 1934 .

[37]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[38]  B. Mohanty,et al.  Estimation of weathering indices using spectral reflectance over visible to mid-infrared region , 2016 .

[39]  J. Wösten,et al.  The HYPRES database of hydraulic properties of European soils , 2000 .

[40]  Naser Davatgar,et al.  Prediction of CEC Using Fractal Parameters by Artificial Neural Networks , 2014 .

[41]  J.G.B. Leenaars,et al.  WoSIS: providing standardised soil profile data for the world , 2016 .

[42]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[43]  Ashok K. Singhvi,et al.  Evolution of the lower Gangetic Plain landforms and soils in West Bengal , 1998 .

[44]  J. Bouma,et al.  Pedotransfer Functions in Earth System Science: Challenges and Perspectives , 2017 .

[45]  Clemens Reimann,et al.  GEMAS: establishing geochemical background and threshold for 53 chemical elements in European agricultural soil , 2017 .