Empirical estimates of uncertainty for mapping continuous depth functions of soil attributes

Abstract We use an empirical method where model output uncertainties are expressed as a prediction interval (PI) of the underlying distribution of prediction errors. This method obviates the need to identify and determine the contribution of each source of uncertainty to the overall prediction uncertainty. Conceptually, in the context of digital soil mapping, rather than a single point estimate at every prediction location, a PI, characterised by upper and lower prediction limits, encloses the prediction (which lies somewhere on the interval) and ideally the true but unknown value 100(1 −  α ) % of times on average the target variable (typically 95%). The idea is to partition the environmental covariate feature space into clusters which share similar attributes using fuzzy k-means with extragrades. Model error for predicting a target variable is then estimated from which cluster PIs are constructed on the basis of the empirical distribution of errors associated with the observations belonging to each cluster. PIs for each non-calibration observation are then formulated on the basis of the grade of membership each has to each cluster. We demonstrate how we can apply this method for mapping continuous soil depth functions. First, using soil depth functions and digital soil mapping (DSM) methods, we map the continuous vertical and lateral distribution of organic carbon (OC) and available water capacity (AWC) across the Edgeroi district in north-western NSW, Australia. From those predictions we define a continuous PI for each prediction node, generating upper and lower prediction limits of both attributes. From an external validation dataset, preliminary results are encouraging where 91% and 93% of the OC and AWC observations respectively fall within the bounds of their 95% PIs. Ideally, 95% of instances should fall within these bounds.

[1]  Budiman Minasny,et al.  Uncertainty analysis for pedotransfer functions , 2002 .

[2]  Alex B. McBratney,et al.  Application of fuzzy sets to climatic classification , 1985 .

[3]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[4]  Budiman Minasny,et al.  On digital soil mapping , 2003 .

[5]  C. Nickerson A note on a concordance correlation coefficient to evaluate reproducibility , 1997 .

[6]  M. G. Anderson Encyclopedia of hydrological sciences. , 2005 .

[7]  Douglas G. Altman,et al.  Statistics in Medicine: Calculating confidence intervals for regression and correlation , 1988 .

[8]  Alex B. McBratney,et al.  Soil pattern recognition with fuzzy-c-means : application to classification and soil-landform interrelationships , 1992 .

[9]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[10]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[11]  Durga L. Shrestha,et al.  Machine learning approaches for estimation of prediction interval for the model output , 2006, Neural Networks.

[12]  A. McBratney,et al.  A continuum approach to soil classification by modified fuzzy k‐means with extragrades , 1992 .

[13]  Budiman Minasny,et al.  Mapping continuous depth functions of soil carbon storage and available water capacity , 2009 .

[14]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[15]  P. Lagacherie,et al.  Fuzzy k-means clustering of fields in an elementary catchment and extrapolation to a larger area , 1997 .

[16]  G. Bragato Fuzzy continuous classification and spatial interpolation in conventional soil survey for soil mapping of the lower Piave plain , 2004 .

[17]  Sabine Grunwald,et al.  Multi-criteria characterization of recent digital soil mapping and modeling approaches , 2009 .

[18]  Richard Webster,et al.  Is soil variation random , 2000 .

[19]  Budiman Minasny,et al.  Estimating Pedotransfer Function Prediction Limits Using Fuzzy k-Means with Extragrades , 2010 .

[20]  Dominique Arrouays,et al.  Extrapolating regional soil landscapes from an existing soil map: Sampling intensity, validation procedures, and integration of spatial context , 2008 .

[21]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[22]  Alex B. McBratney,et al.  On Variation, Uncertainty and Informatics in Environmental Soil Management , 1992 .

[23]  A. McBratney,et al.  Application of fuzzy sets in soil science: fuzzy logic, fuzzy measurements and fuzzy decisions , 1997 .

[24]  Dimitri Solomatine,et al.  A novel method to estimate model uncertainty using machine learning techniques , 2009 .

[25]  J. Gallant,et al.  A multiresolution index of valley bottom flatness for mapping depositional areas , 2003 .

[26]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[27]  Gerard B. M. Heuvelink,et al.  Updating the 1:50,000 Dutch soil map using legacy soil data: A multinomial logistic regression approach , 2009 .

[28]  J. Beek,et al.  Developments in Soil Science , 2019, Global Change and Forest Soils.

[29]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[30]  Thorsten Behrens,et al.  Uncertainty analysis of sample locations within digital soil mapping approaches , 2010 .

[31]  Alex B. McBratney,et al.  Modelling soil attribute depth functions with equal-area quadratic smoothing splines , 1999 .

[32]  Gerard B. M. Heuvelink,et al.  Assessing uncertainty propagation through physically based models of soil water flow and solute transport , 2006 .

[33]  Jones Robert,et al.  Soil Classification 2001 , 2002 .