Statistical approaches for spatial sample survey: Persistent misconceptions and new developments

Several misconceptions about the design-based approach for sampling and statistical inference, based on classical sampling theory, seem to be quite persistent. These misconceptions are the result of confusion about basic statistical concepts such as independence, expectation, and bias and variance of estimators or predictors. These concepts have a different meaning in the design-based and model-based approach, because they consider different sources of randomness. Also, a population mean is still often confused with a model mean, and a population variance with a model-variance, leading to invalid formulas for the variance of an estimator of the population mean. In this paper the fundamental differences between these two approaches are illustrated with simulations, so that hopefully more pedometricians get a better understanding of this subject. An overview is presented of how in the design-based approach we can make use of knowledge of the spatial structure of the study variable. In the second part, new developments in both the design-based and model-based approach are described that try to combine the strengths of the two approaches.

[1]  R. Barnes Bounding the required sample size for geologic site characterization , 1988 .

[2]  Richard Webster,et al.  Estimating temporal change in soil monitoring: I. Statistical theory , 1995 .

[3]  F. Breidt,et al.  Non‐parametric small area estimation using penalized spline regression , 2008 .

[4]  Paul H. C. Eilers,et al.  Fast and compact smoothing on large multidimensional grids , 2006, Comput. Stat. Data Anal..

[5]  Niklas L. P. Lundström,et al.  Spatially Balanced Sampling through the Pivotal Method , 2012, Biometrics.

[6]  Zhou Shi,et al.  Baseline estimates of soil organic carbon by proximal sensing : Comparing design-based, model-assisted and model-based inference , 2016 .

[7]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[8]  Changbao Wu,et al.  A Model-Calibration Approach to Using Complete Auxiliary Information From Survey Data , 2001 .

[9]  Paul H. C. Eilers,et al.  Twenty years of P-splines , 2015 .

[10]  Jinfeng Wang,et al.  A review of spatial sampling , 2012 .

[11]  Jean D. Opsomer,et al.  Model-Assisted Survey Estimation with Modern Prediction Techniques , 2017 .

[12]  M. Salibian-Barrera,et al.  Methods for preferential sampling in geostatistics , 2019 .

[13]  D. Ruppert Selecting the Number of Knots for Penalized Splines , 2002 .

[14]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[15]  J. Deville,et al.  Efficient balanced sampling: The cube method , 2004 .

[16]  Robert Haining,et al.  Sample surveying to estimate the mean of a heterogeneous surface: reducing the error variance through zoning , 2010, Int. J. Geogr. Inf. Sci..

[17]  D Hémon,et al.  Assessing the significance of the correlation between two spatial processes. , 1989, Biometrics.

[18]  D. Brus,et al.  Accounting for differences in costs among sampling locations in optimal stratification , 2018, European Journal of Soil Science.

[19]  Morris H. Hansen,et al.  An Evaluation of Model-Dependent and Probability-Sampling Inferences in Sample Surveys: Rejoinder , 1983 .

[20]  A. Olsen,et al.  Spatially Balanced Sampling of Natural Resources , 2004 .

[21]  D. J. Brus,et al.  A sampling scheme for estimating the mean extractable phosphorus concentration of fields for environmental regulation , 1999 .

[22]  J. Illian,et al.  Accounting for preferential sampling in species distribution models , 2018, Ecology and evolution.

[23]  Budiman Minasny,et al.  Optimizing stratification and allocation for design-based estimation of spatial means using predictions with error , 2015 .

[24]  Malay Ghosh,et al.  Small Area Estimation: An Appraisal , 1994 .

[25]  D. J. Brus,et al.  Random sampling or geostatistical modelling? Choosing between design-based and model-based sampling strategies for soil (with discussion) , 1997 .

[26]  D. J. Brus,et al.  Using regression models in design‐based estimation of spatial means of soil properties , 2000 .

[27]  Paul H. C. Eilers,et al.  Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm , 2014, Statistics and Computing.

[28]  Giorgio E. Montanari,et al.  Model‐Assisted Estimation of a Spatial Population Mean , 2010 .

[29]  D. Brus Balanced sampling: A versatile sampling approach for statistical soil surveys , 2015 .

[30]  D. Griffith Effective Geographic Sample Size in the Presence of Spatial Autocorrelation , 2005 .

[31]  A. Chaudhuri,et al.  Small domain statistics: a review , 1994 .

[32]  F. Breidt,et al.  Model-Assisted Estimation for Complex Surveys Using Penalized Splines , 2005 .

[33]  Brian D. Ripley,et al.  Spatial Statistics: Ripley/Spatial Statistics , 2005 .

[34]  C. Braak,et al.  Model-free estimation from spatial samples: A reappraisal of classical sampling theory , 1990 .

[35]  J. J. de Gruijter,et al.  A structured approach to designing soil survey schemes with prediction of sampling error from variograms , 1994 .