论文信息 - An Improved K-Means Clustering Algorithm Based on Spectral Method

An Improved K-Means Clustering Algorithm Based on Spectral Method

It is well known that K-means algorithm is very sensitive to outliers, and often terminates at a local optimum. Furthermore, it is necessary for K-means algorithm to determine the number K of clusters as a priori knowledge in advance. Therefore, the quality of the result is not satisfactory. In this paper, we develop an improved K-means clustering algorithm--NK-means. NK-means is based on spectral methods, namely uses Normal matrix that is used in spectral analysis approaches to normalize original datasets, and then finds clusters in the processed datasets by K-means algorithm. We also propose a measure for the strength of clusters structure found by NK-means algorithm, which gives us an objective metric for choosing the number K of clusters into which a data set should be divided. Experiment shows that NK-means algorithm significantly outperforms K-means in the efficiency and accuracy.

[1] G. Caldarelli,et al. Detecting communities in large networks , 2004, cond-mat/0402499.

[2] H. L. Le Roy,et al. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[3] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[4] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .