Clustering Algorithm on Data Stream with Skew Distribution Based on Temporal Density

To solve the problem of clustering this paper proposes a concept of temporal density, which reveals a set of mathematical properties, especially the incremental computation. A clustering algorithm named TDCA (temporal density based clustering algorithm) with time complexity of O(c×m×lgm) is created with a tree structure implemented for both storage and retrieve efficiency. TDCA is capable of capturing the temporal features of a data stream with skew data distribution either in real time or on demand. The experimental results show that TDCA is functionable and scalable.

[1]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[2]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[3]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[4]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[5]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[6]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[7]  Lida Xu,et al.  A local-density based spatial clustering algorithm with noise , 2007, Inf. Syst..

[8]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[9]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[10]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[11]  Qian Weining,et al.  Analysis and Management of Streaming Data: A Survey , 2004 .

[12]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[13]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[14]  Daling Wang,et al.  CDS-Tree: an effective index for clustering arbitrary shapes in data streams , 2005, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05).

[15]  Yin Jian,et al.  Arbitrary Shape Cluster Algorithm for Clustering Data Stream , 2006 .