A Density Grid-Based Clustering Algorithm for Uncertain Data Streams

This paper proposes a grid-based clustering algorithm Clu-US which is competent to find clusters of non-convex shapes on uncertain data stream. Clu-US maps the uncertain data tuples to the grid space which could store and update the summary information of stream. The uncertainty of data is taken into account for calculating the probability center of a grid. Then, the distance between the probability centers of two adjacent grids is adopted for measuring whether they are "close enough" in grids merging process. Furthermore, a dynamic outlier deletion mechanism is developed to improve clustering performance. The experimental results show that Clu-US outperforms other algorithms in terms of clustering quality and speed.

[1]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Andrew McGregor,et al.  Estimating statistical aggregates on probabilistic data streams , 2008, TODS.

[3]  Biao Qin,et al.  A Bayesian classifier for uncertain data , 2010, SAC '10.

[4]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[5]  Andrew McGregor,et al.  Estimating statistical aggregates on probabilistic data streams , 2007, PODS.

[6]  Charu C. Aggarwal,et al.  On High Dimensional Projected Clustering of Uncertain Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[7]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[8]  Wenfei Fan,et al.  Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[9]  Chen Zhang,et al.  Tracking High Quality Clusters over Uncertain Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[11]  David Wai-Lok Cheung,et al.  Clustering Uncertain Data Using Voronoi Diagrams and R-Tree Index , 2010, IEEE Transactions on Knowledge and Data Engineering.

[12]  Jeffrey Xu Yu,et al.  Sliding-window top-k queries on uncertain streams , 2008, The VLDB Journal.

[13]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.