Adaptive Determining for Optimal Cluster Number of K-Means Clustering Algorithm

The clustering accuracy of K-means algorithm highly depends on the initial number of clusters and it takes a long time when dealing with the large sample data with high dimension. To solve these problems, this paper proposes a method to reduce the dimensionality for high dimensional data by multidimensional scaling transformation and designs a measure which can effectively evaluate the quality of nuclear clustering algorithm. Furthermore, an adaptive method to determine the optimal cluster number is presented. It firstly predicts the initial cluster number in a low-dimensional space by the tree clustering. Then, the optimal cluster number is gotten by the adaptive algorithm. Experiments show that this method has higher accuracy and stability.

[1]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[2]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[3]  K. alik An efficient k'-means clustering algorithm , 2008 .

[4]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[5]  Aristidis Likas,et al.  The Global Kernel $k$-Means Algorithm for Clustering in Feature Space , 2009, IEEE Transactions on Neural Networks.

[6]  Aristidis Likas,et al.  The global kernel k-means clustering algorithm , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[7]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[9]  Song Xin,et al.  K-means Clustering Algorithm with Meliorated Initial Center , 2007 .

[10]  Adil M. Bagirov,et al.  Modified global k-means algorithm for minimum sum-of-squares clustering problems , 2008, Pattern Recognit..