Understanding the influence of noise, sampling density and data distribution on spatial prediction quality through the use of simulated data

The influence of data parameters (sensor error, unexplained variance, sampling density and data distribution) on spatial data prediction quality is considered through the use of a spatial data simulator. Performance of linear and non-linear regression models (feedforward neural networks) is compared on simulated agricultural data, but the results can be generalized to geological, oceanographic and other spatial domains. For a highly non-linear response variable, non-linear models are shown to perform better regardless of unexplained variance and sensor error, but linear models outperform non-linear models when the sampling density of spatial data is not sufficient to produce accurate interpolated values. In the presence of non-homogenous data distributions, a significant prediction quality improvement can be achieved by using specialized local models assuming that distributions are properly discovered.