论文信息 - Adaptive non-linear clustering in data streams

Adaptive non-linear clustering in data streams

Data stream clustering has emerged as a challenging and interesting problem over the past few years. Due to the evolving nature, and one-pass restriction imposed by the data stream model, traditional clustering algorithms are inapplicable for stream clustering. This problem becomes even more challenging when the data is high-dimensional and the clusters are not linearly separable in the input space. In this paper, we propose a nonlinear stream clustering algorithm that adapts to the stream's evolutionary changes. Using the kernel methods for dealing with the non-linearity of data separation, we propose a novel 2-tier stream clustering architecture. Tier-1 captures the temporal locality in the stream, by partitioning it into segments, using a kernel-based novelty detection approach. Tier-2 exploits this segment structure to continuously project the streaming data nonlinearly onto a low-dimensional space (LDS), before assigning them to a cluster. We demonstrate the effectiveness of our approach through extensive experimental evaluation on various real-world datasets.

Edward Y. Chang | Zhihua Zhang | Ankur Jain

[1] Junshui Ma,et al. Online novelty detection on temporal sequences , 2003, KDD '03.

[2] Haitao Zhao,et al. Incremental eigen decomposition , 2003 .

[3] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[4] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[6] Philip S. Yu,et al. A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[7] Charu C. Aggarwal,et al. A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[8] Christopher J. Merz,et al. UCI Repository of Machine Learning Databases , 1996 .

[9] J. Gower. Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[10] Alexander J. Smola,et al. Learning with kernels , 1998 .

[11] Yoshua Bengio,et al. Spectral Clustering and Kernel PCA are Learning Eigenfunctions , 2003 .