Clustering Transactional Data Streams

The challenge of mining data streams is three fold. Firstly, an algorithm for a particular data mining task is subject to the sequential one-pass constraint; secondly, it must work under bounded resources such as memory and disk space; thirdly, it should have capabilities to answer time-sensitive queries. Dealing with transactional data streams is even more challenging due to their high dimensionality and sparseness. In this paper, algorithms for clustering transactional data streams are proposed by incorporating the incremental clustering algorithm INCLUS into the equal-width time window model and the elastic time window model. These algorithms can efficiently cluster a transactional data stream in one pass and answer time sensitive queries at different granularities with limited resources.

[1]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Ee-Peng Lim,et al.  SCLOPE: An Algorithm for Clustering Data Streams of Categorical Attributes , 2004, DaWaK.

[3]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[4]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[5]  Sudipto Guha,et al.  Clustering data streams , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[6]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.

[7]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[8]  Raj P. Gopalan,et al.  Clustering high dimensional sparse transactional data with constraints , 2006, 2006 IEEE International Conference on Granular Computing.