Clustering on demand for multiple data streams

In the data stream environment, the patterns generated by the mining techniques are usually distinct at different time because of the evolution of data. In order to deal with various types of multiple data streams and to support flexible mining requirements, we devise in this paper a clustering on demand framework, abbreviated as COD framework, to dynamically cluster multiple data streams. While providing a general framework of clustering on multiple data streams, the COD framework has two major features, namely one data scan for online statistics collection and compact multiresolution approximations, which are designed to address, respectively, the time and the space constraints in a data stream environment. Furthermore, with the multiresolution approximations of data streams, flexible clustering demands can be supported.

[1]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[3]  Ambuj K. Singh,et al.  SWAT: hierarchical stream summarization in large networks , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[5]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2003, Third IEEE International Conference on Data Mining.

[6]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[7]  Jiong Yang Dynamic clustering of evolving streams with a single pass , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[8]  Philip S. Yu,et al.  A Regression-Based Temporal Pattern Mining Scheme for Data Streams , 2003, VLDB.

[9]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[10]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[11]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[12]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.