An Automatic Index Validity for Clustering

Many validity index algorithms have been proposed to determine the number of clusters. These methods usually employ the Euclidean distance as the measurement. However, it is difficult for the Euclidean distance metric to evaluate the compactness of data when non-linear relationship exists between different components of data. Moreover, most current algorithms can not estimate well the scope of the number of clusters. To address these problems, in this paper, we adopt the kernel-induced distance to measure the relationship among data points. We first estimate the upper bound of the number of clusters to effectively reduce iteration time of validity index algorithm. Then, to determine the number of clusters, we design a kernelized validity index algorithm to automatically determine the optimal number of clusters. Experiments show that the proposed approach can obtain promising results.

[1]  Jing-Yu Yang,et al.  A novel method for Fisher discriminant analysis , 2004, Pattern Recognit..

[2]  Jian Yang,et al.  A reformative kernel Fisher discriminant analysis , 2004, Pattern Recognit..

[3]  David Zhang,et al.  A method for speeding up feature extraction based on KPCA , 2007, Neurocomputing.

[4]  Jing-Yu Yang,et al.  An efficient renovation on kernel Fisher discriminant analysis and face recognition experiments , 2004, Pattern Recognit..

[5]  Shengrui Wang,et al.  FCM-Based Model Selection Algorithms for Determining the Number of Clusters , 2004, Pattern Recognit..

[6]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[7]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[8]  David Zhang,et al.  A fast kernel-based nonlinear discriminant analysis for multi-class problems , 2006, Pattern Recognit..

[9]  Bernhard Schölkopf,et al.  A Primer on Kernel Methods , 2004 .

[10]  Weina Wang,et al.  On fuzzy cluster validity indices , 2007, Fuzzy Sets Syst..

[11]  Michael K. Ng,et al.  Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters , 2008, IEEE Transactions on Knowledge and Data Engineering.

[12]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[13]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Boudewijn P. F. Lelieveldt,et al.  A new cluster validity index for the fuzzy c-mean , 1998, Pattern Recognit. Lett..

[15]  M.-C. Su,et al.  A new cluster validity measure and its application to image compression , 2004, Pattern Analysis and Applications.