The influence of data parameters (sensor error, unexplained variance, sampling density and data distribution) on spatial data prediction quality is considered through the use of a spatial data simulator. Performance of linear and non-linear regression models (feedforward neural networks) is compared on simulated agricultural data, but the results can be generalized to geological, oceanographic and other spatial domains. For a highly non-linear response variable, non-linear models are shown to perform better regardless of unexplained variance and sensor error, but linear models outperform non-linear models when the sampling density of spatial data is not sufficient to produce accurate interpolated values. In the presence of non-homogenous data distributions, a significant prediction quality improvement can be achieved by using specialized local models assuming that distributions are properly discovered.
[1]
E. Ziegel.
Introduction to the Theory and Practice of Econometrics
,
1989
.
[2]
Mike Rees,et al.
5. Statistics for Spatial Data
,
1993
.
[3]
Noel A Cressie,et al.
Statistics for Spatial Data.
,
1992
.
[4]
Zoran Obradovic,et al.
A tool for controlled knowledge discovery in spatial domains
,
2000,
ESM.
[5]
Zoran Obradovic,et al.
Clustering-regression-ordering steps for knowledge discovery in spatial databases
,
1999,
IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).