StreamSamp DataStream Clustering Over Tilted Windows Through Sampling

This article presents StreamSamp, a new algorithm for data stream summarizing. The approach proposed here is simply based on the fundamental technique of sampling the entering stream followed by an intelligent storage of the generated samples thus allowing for the study of the entire stream as well as a part of it. This algorithm is of course one pass and benefits from its capability to process large amounts of high speed data independently of its dimensionality. The versatility of this summarizing algorithm as a pre-processing bmock is contrasted with other more dedicated state-of-the-art algorithms and its performances are illustrated on a clustering task by a comparison with the performances of CluStream, a reference algorithm in the field for this task.