Performance Evaluation of Some Clustering Algorithms and Validity Indices

In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, and a recently developed index I. Based on a relation between the index I and the Dunn's index, a lower bound of the value of the former is theoretically estimated in order to get unique hard K-partition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and real-life data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SA-based clustering technique is used for proper partitioning of the data into the said number of clusters.

[1]  Marina Meila,et al.  An Experimental Comparison of Several Clustering and Initialization Methods , 1998, UAI.

[2]  SANGHAMITRA BANDYOPADHYAY,et al.  Clustering Using Simulated Annealing with Probabilistic Redistribution , 2001, Int. J. Pattern Recognit. Artif. Intell..

[3]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[4]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[5]  KrishnapuramRaghu,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999 .

[6]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[7]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[8]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[10]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[11]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[12]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[14]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[15]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[17]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..