Please use this identifier to cite or link to this item:
|Scopus||Web of Science®||Altmetric|
|Title:||Normality tests for spatialy correlated data|
|Citation:||Mathematical Geosciences, 2004; 36(6):659-681|
|Publisher:||Kluwer Academic/Plenum Publ|
|Abstract:||In studies that involve a finite sample size of spatial data it is often of interest to test (statistically) the assumption that the marginal (or univariate) distribution of the data is Gaussian (normal). This may be important per se because, for example, a data transformation may be desired if the normality hypothesis is rejected, or it may provide a way of testing other hypotheses, such as lognormality, by testing the normality of the logarithms of the observations. The most commonly used tests, such as the Kolmogorov–Smirnov (K–S), chi-square (χ2), and Shapiro–Wilks (S–W) tests, are designed on the assumption that the observations are independent and identically distributed (iid). In geostatistical applications, however, this is not usually the case unless the spatial covariance (semivariogram) function is a pure nugget variance. If the covariance structure has a (practical) range greater than the minimum distance between observations, the data are correlated and the standard tests cannot be applied to the probability density function (pdf) or cumulative probability function (cdf) estimated directly from the data. The problem with correlated data arises not from the correlation per se but from cases in which correlated data are clustered rather than being located on a regular grid. In these cases inferences requiring iid assumptions may be seriously biased because of the spatial correlation among the observations. If unbiased (i.e., de-clustered) estimates of the pdf or cdf are obtained, then normality tests, such as K-S, χ2, or S–W, can be applied using the unbiased estimates and an effective number of samples equivalent to the iid case. There are three questions to be addressed in these cases: Is the distribution ergodic? How are unbiased estimates of the pdf and cdf obtained from clustered samples? What is the effective number of samples equivalent to the iid case? Working within the framework of the universal model (generalized linear model) in which a spatial process, Z(x), is composed of a deterministic drift m(x) and an (auto-) correlated residual e(x), Z(x) = m(x) + e(x), the assumption of distribution ergodicity (an assumption that can be checked from the experimental data) implies that the normality test should be applied to the variable, Z(x), if the drift is constant (m(x) = m), and to the residual variable if the drift is variable. We show that an efficient method for obtaining unbiased estimates of the pdf or cdf is by weighting the observations (i.e., de-clustering) using block kriging. Block kriging requires an estimate of the semivariogram and we present a new method of semivariogram estimation that is robust with respect to data clustering. In addition, we discuss a way of determining the effective number of samples required for the application of a normality test and for constructing confidence intervals for statistics such as the mean and variance. The method is illustrated using a published data set.|
|Description:||© 2004 International Association for Mathematical Geology|
|Appears in Collections:||Aurora harvest|
Civil and Environmental Engineering publications
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.