In studies that involve a finite sample size of spatial data it is often of interest to test (statistically) the assumption that the marginal (or univariate) distribution of the data is Gaussian (normal). This may be important per se because, for example, a data transformation may be desired if the normality hypothesis is rejected, or it may provide a way of testing other hypotheses, such as lognormality, by testing the normality of the logarithms of the observations. The most commonly used tests, such as the Kolmogorov–Smirnov (K–S), chi-square (χ2), and Shapiro–Wilks (S–W) tests, are designed on the assumption that the observations are independent and identically distributed (iid). In geostatistical applications, however, this is not usually the case unless the spatial covariance (semivariogram) function is a pure nugget variance. If the covariance structure has a (practical) range greater than the minimum distance between observations, the data are correlated and the standard tests cannot be applied to the probability density function (pdf) or cumulative probability function (cdf) estimated directly from the data. The problem with correlated data arises not from the correlation per se but from cases in which correlated data are clustered rather than being located on a regular grid. In these cases inferences requiring iid assumptions may be seriously biased because of the spatial correlation among the observations. If unbiased (i.e., de-clustered) estimates of the pdf or cdf are obtained, then normality tests, such as K-S, χ2, or S–W, can be applied using the unbiased estimates and an effective number of samples equivalent to the iid case. There are three questions to be addressed in these cases:• Is the distribution ergodic?• How are unbiased estimates of the pdf and cdf obtained from clustered samples?• What is the effective number of samples equivalent to the iid case?Working within the framework of the universal model (generalized linear model) in which a spatial process, Z(x), is composed of a deterministic drift m(x) and an (auto-) correlated residual e(x), Z(x) = m(x) + e(x), the assumption of distribution ergodicity (an assumption that can be checked from the experimental data) implies that the normality test should be applied to the variable, Z(x), if the drift is constant (m(x) = m), and to the residual variable if the drift is variable. We show that an efficient method for obtaining unbiased estimates of the pdf or cdf is by weighting the observations (i.e., de-clustering) using block kriging. Block kriging requires an estimate of the semivariogram and we present a new method of semivariogram estimation that is robust with respect to data clustering. In addition, we discuss a way of determining the effective number of samples required for the application of a normality test and for constructing confidence intervals for statistics such as the mean and variance. The method is illustrated using a published data set.
[1]
H. Lilliefors.
On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown
,
1967
.
[2]
E. Pardo‐Igúzquiza.
COMPARISON OF GEOSTATISTICAL METHODS FOR ESTIMATING THE AREAL AVERAGE CLIMATOLOGICAL RAINFALL MEAN USING DATA ON PRECIPITATION AND TOPOGRAPHY
,
1998
.
[3]
J. B. Pearson,et al.
Methodology in Social Research.
,
1968
.
[4]
H. Omre.
The Variogram and its Estimation
,
1984
.
[5]
A. Journel.
Nonparametric estimation of spatial distributions
,
1983
.
[6]
Athanasios Papoulis,et al.
Probability, Random Variables and Stochastic Processes
,
1965
.
[7]
D. Berry,et al.
Statistics: Theory and Methods
,
1990
.
[8]
Clayton V. Deutsch,et al.
GSLIB: Geostatistical Software Library and User's Guide
,
1993
.
[9]
Clayton V. Deutsch,et al.
DECLUS: a FORTRAN 77 program for determining optimum spatial declustering weights
,
1989
.
[10]
N. Schofield.
Using the Entropy Statistic to infer Population Parameters from Spatially Clustered Sampling
,
1993
.