Approximate Clustering on Data Streams Using Discrete Cosine Transform

In this study, a clustering algorithm that uses DCT transformed data is presented. The algorithm is a grid density-based clustering algorithm that can identify clusters of arbitrary shape. Streaming data are transformed and reconstructed as needed for clustering. Experimental results show that DCT is able to approximate a data distribution efficiently using only a small number of coefficients and preserve the clusters well. The grid based clustering algorithm works well with DCT transformed data, demonstrating the viability of DCT for data stream clustering applications. Keywords—Grid Density-Based Clustering, Approximate Cluster Analysis, Discrete Cosine Transform, Sampling, Data Reconstruction, Data Compression

[1]  Jianqin Zhou,et al.  On discrete cosine transform , 2011, ArXiv.

[2]  Philip S. Yu,et al.  Mining Data Streams , 2005, The Data Mining and Knowledge Discovery Handbook.

[3]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[4]  Deok-Hwan Kim,et al.  Multi-dimensional selectivity estimation using compressed histogram information , 1999, SIGMOD '99.

[5]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[6]  Mohamed Medhat Gaber,et al.  Towards an Adaptive Approach for Mining Data Streams in Resource Constrained Environments , 2004, DaWaK.

[7]  Douglas H. Fisher,et al.  Iterative Optimization and Simplification of Hierarchical Clusterings , 1996, J. Artif. Intell. Res..

[8]  Tieniu Tan,et al.  Mixture clustering using multidimensional histograms for skin detection , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[9]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[10]  Won Suk Lee,et al.  Statistical grid-based clustering over data streams , 2004, SGMD.

[11]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[13]  Yi-Hong Lu,et al.  Mining data streams using clustering , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.