Clustering of High Dimensional Data Streams

Clustering of data streams has become a task of great interest in the recent years as such data formats is are becoming increasingly ambiguous. In many cases, these data are also high dimensional and in result more complex for clustering. As such there is a growing need for algorithms that can be applied on streaming data and the at same time can cope with high dimensionality. To this end, here we design a streaming clustering approach by extending a recently proposed high dimensional clustering algorithm.

[1]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[2]  Bernhard Seeger,et al.  Cluster Kernels: Resource-Aware Kernel Density Estimators over Streaming Data , 2006, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Dimitris K. Tasoulis,et al.  Enhancing principal direction divisive clustering , 2010, Pattern Recognit..

[6]  Malcolm P. Atkinson,et al.  Issues Raised by Three Years of Developing PJama: An Orthogonally Persistent Platform for Java , 1999, ICDT.

[7]  Vipin Kumar,et al.  The Challenges of Clustering High Dimensional Data , 2004 .

[8]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[9]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[10]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[11]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[12]  Bernhard Seeger,et al.  Towards Kernel Density Estimation over Streaming Data , 2006, COMAD.

[13]  Jean-Paul Chilès,et al.  Wiley Series in Probability and Statistics , 2012 .

[14]  Juyang Weng,et al.  A Fast Algorithm for Incremental Principal Component Analysis , 2003, IDEAL.

[15]  Juyang Weng,et al.  Candid Covariance-Free Incremental Principal Component Analysis , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Geoff Hulten,et al.  A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering , 2001, ICML.

[17]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[18]  Li Wei,et al.  M-kernel merging: towards density estimation over data streams , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..