A new spatial prediction method for georeferenced data streams

Massive datasets having the form of continuous streams, with no fixed length, are becoming very common due to the availability of sensor networks which can perform, at a very high frequency, repeated measurements of some variable. In many real world applications such data streams depend on the geographic location of each sensing device so that the records collected by near sensors are more likely to be similar than data collected in distant places.This paper proposes a strategy for monitoring the spatial dependence among data streams and for the prediction of data at spatial locations where there is no recording by sensors. The strategy is based on distributed processing. At each sensor it is performed a summarization of the data by means of a micro-clustering strategy for histogram data. At the central processing node, it is performed the measurement of the spatial dependence and the prediction at unsampled location through a kriging based approach. In order to evaluate the effectiveness of the proposed strategy we have performed extensive tests on real data.