An Improved K-Means Clustering Algorithm Based on Spectral Method

It is well known that K-means algorithm is very sensitive to outliers, and often terminates at a local optimum. Furthermore, it is necessary for K-means algorithm to determine the number K of clusters as a priori knowledge in advance. Therefore, the quality of the result is not satisfactory. In this paper, we develop an improved K-means clustering algorithm--NK-means. NK-means is based on spectral methods, namely uses Normal matrix that is used in spectral analysis approaches to normalize original datasets, and then finds clusters in the processed datasets by K-means algorithm. We also propose a measure for the strength of clusters structure found by NK-means algorithm, which gives us an objective metric for choosing the number K of clusters into which a data set should be divided. Experiment shows that NK-means algorithm significantly outperforms K-means in the efficiency and accuracy.