A New Cluster Validity for Data Clustering

Cluster validity has been widely used to evaluate the fitness of partitions produced by clustering algorithms. This paper presents a new validity, which is called the Vapnik–Chervonenkis-bound (VB) index, for data clustering. It is estimated based on the structural risk minimization (SRM) principle, which optimizes the bound simultaneously over both the distortion function (empirical risk) and the VC-dimension (model complexity). The smallest bound of the guaranteed risk achieved on some appropriate cluster number validates the best description of the data structure. We use the deterministic annealing (DA) algorithm as the underlying clustering technique to produce the partitions. Five numerical examples and two real data sets are used to illustrate the use of VB as a validity index. Its effectiveness is compared to several popular cluster-validity indexes. The results of comparative study show that the proposed VB index has high ability in producing a good cluster number estimate and in addition, it provides a new approach for cluster validity from the view of statistical learning theory.

[1]  Y. Fukuyama,et al.  A new method of choosing the number of clusters for the fuzzy c-mean method , 1989 .

[2]  J. Bezdek Numerical taxonomy with fuzzy sets , 1974 .

[3]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[4]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[5]  Miin-Shen Yang,et al.  A cluster validity index for fuzzy clustering , 2005, Pattern Recognit. Lett..

[6]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[7]  Hichem Frigui,et al.  The fuzzy c spherical shells algorithm: A new approach , 1992, IEEE Trans. Neural Networks.

[8]  Isak Gath,et al.  Detection and Separation of Ring-Shaped Clusters Using Fuzzy Clustering , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..

[10]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[11]  Geoffrey C. Fox,et al.  Constrained Clustering as an Optimization Method , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[13]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[14]  Doheon Lee,et al.  On cluster validity index for estimation of the optimal number of fuzzy clusters , 2004, Pattern Recognit..

[15]  Haiyoung Lee A Cluster validity Index for Fuzzy Clustering , 1999 .

[16]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[17]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Boudewijn P. F. Lelieveldt,et al.  A new cluster validity index for the fuzzy c-mean , 1998, Pattern Recognit. Lett..

[21]  Noureddine Zahid,et al.  A new cluster-validity for fuzzy clustering , 1999, Pattern Recognit..

[22]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Klaus Obermayer,et al.  Self-organizing maps: Generalizations and new optimization techniques , 1998, Neurocomputing.

[24]  A. Boudraa Dynamic estimation of number of clusters in data sets , 1999 .

[25]  Soon-H. Kwon Cluster validity index for fuzzy clustering , 1998 .

[26]  XieXuanli Lisa,et al.  A Validity Measure for Fuzzy Clustering , 1991 .

[27]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[28]  Stephen J. Roberts,et al.  Minimum-Entropy Data Partitioning Using Reversible Jump Markov Chain Monte Carlo , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..