A Weighted Subspace Clustering Algorithm in High-Dimensional Data Streams

Clustering is a significant and difficult problem in data stream mining due to a mass of streaming data arriving continuously. High-dimensional data streams make clustering analysis more complex because of the sparsity of data. In this paper, we propose a new clustering method for highdimensional data streams, called WSCStream. The method incorporates a fading cluster structure and a dimensional weight matrix. We assign a weight to each dimension of corresponding cluster in the matrix. The weight associated with each dimension indicates the importance of each dimension to the corresponding cluster. The weighted distance between a cluster and a data point is used to obtain the final clusters as the new data points arrive over time. Experimental results on real and synthetic datasets demonstrate that WSCStream has higher clustering quality than PHStream.