Dynamic density-based clustering algorithm over uncertain data streams

In recent years, the uncertain data stream which is related in many real applications attracts more and more attention of researchers. As one aspect of uncertain character, existence-uncertainty can affect the clustering process and results significantly. The lately reported clustering algorithms are all based on K-Means algorithm with the inhere shortage. DCUStream algorithm which is density-based clustering algorithm over uncertain data stream is proposed in this paper. It can find arbitrary shaped clusters with less time cost in high dimension data stream. In the meantime, a dynamic density threshold is designed to accommodate the changing density of grids with time in data stream. The experiment results show that DCUStream algorithm can acquire more accurate clustering result and execute the clustering process more efficiently on progressing uncertain data stream.

[1]  Hans-Peter Kriegel,et al.  Hierarchical density-based clustering of uncertain data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Zhou Ao,et al.  A Survey on the Management of Uncertain Data , 2009 .

[4]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[5]  Charu C. Aggarwal,et al.  On High Dimensional Projected Clustering of Uncertain Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6]  Aoying Zhou,et al.  A Survey on the Management of Uncertain Data: A Survey on the Management of Uncertain Data , 2009 .

[7]  Graham Cormode,et al.  Approximation algorithms for clustering uncertain data , 2008, PODS.

[8]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Chen Zhang,et al.  Tracking High Quality Clusters over Uncertain Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[11]  Andries Petrus Engelbrecht,et al.  Clustering data in an uncertain environment using an artificial immune system , 2011, Pattern Recognit. Lett..

[12]  Aoying Zhou,et al.  Tracking clusters in evolving data streams over sliding windows , 2008, Knowledge and Information Systems.

[13]  Reynold Cheng,et al.  Efficient Clustering of Uncertain Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[14]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Hans-Peter Kriegel,et al.  Density-based clustering of uncertain data , 2005, KDD '05.

[16]  Jin Che Clustering Algorithm over Uncertain Data Streams , 2010 .