Clustering Algorithm over Uncertain Data Streams

This paper proposes a novel algorithm, named EMicro, to cluster uncertain data streams. Although most of the works used today mainly use the distance metric to describe the cluster quality, EMicro considers distance metric and data uncertainty together to measure the clustering quality. Another contribution of this paper is the outlier processing mechanism. Two buffers are maintained to reserve normal micro-clusters and potential outlier micro-clusters, respectively, to obtain good performance. Experimental results show that EMicro outperforms existing methods in efficiency and effectiveness.