A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

This paper presents a cluster validity measure with a hybrid parameter search method for the support vector clustering (SVC) algorithm to identify an optimal cluster structure for a given data set. The cluster structure obtained by the SVC is controlled by two parameters: the parameter of kernel functions, denoted as q; and the soft-margin constant of Lagrangian functions, denoted as C. Large trial-and-error search efforts on these two parameters are necessary for reaching a satisfactory clustering result. From intensive observations of the behavior of the cluster splitting, we found that (1) the overall search range of q is related to the densities of the clusters; (2) each cluster structure corresponds to an interval of q, and the size of each interval is different; and (3) identifying the optimal structure is equivalent to finding the largest interval among all intervals. We have based our findings on developing a validity measure with an ad hoc parameter search algorithm to enable the SVC algorithm to identify optimal cluster configurations with a minimal number of executions. Computer simulations have been conducted on benchmark data sets to demonstrate the effectiveness and robustness of our proposed approach.

[1]  J. Chiang,et al.  A new kernel-based fuzzy clustering approach: support vector clustering with cell growing , 2003, IEEE Trans. Fuzzy Syst..

[2]  Hava T. Siegelmann,et al.  A support vector clustering method , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[3]  Babak Rezaee,et al.  A cluster validity index for fuzzy clustering , 2010, Fuzzy Sets Syst..

[4]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  James C. Bezdek,et al.  Validity-guided (re)clustering with applications to image segmentation , 1996, IEEE Trans. Fuzzy Syst..

[6]  Shengrui Wang,et al.  An objective approach to cluster validation , 2006, Pattern Recognit. Lett..

[7]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[8]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Doheon Lee,et al.  Fuzzy cluster validation index based on inter-cluster proximity , 2003, Pattern Recognit. Lett..

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Y. Fukuyama,et al.  A new method of choosing the number of clusters for the fuzzy c-mean method , 1989 .

[13]  J. Bezdek Numerical taxonomy with fuzzy sets , 1974 .

[14]  Jeen-Shing Wang,et al.  A validity-guided support vector clustering algorithm for identification of optimal cluster configuration , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[15]  S. Abe,et al.  Spatially chunking support vector clustering algorithm , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[16]  Masaru Tanaka,et al.  On the Support Vector Machine with the kernel of the q-normal distribution , 2002 .

[17]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[18]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Minho Kim,et al.  New indices for cluster validity assessment , 2005, Pattern Recognit. Lett..

[20]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[21]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[22]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[23]  Francesco Camastra,et al.  A novel kernel method for clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[25]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[26]  Miin-Shen Yang,et al.  A cluster validity index for fuzzy clustering , 2005, Pattern Recognit. Lett..

[27]  Chien-Hsing Chou,et al.  A New Cluster Validity Measure for Clusters with Different Densities , 2004 .