An extensive comparative study of cluster validity indices

The validation of the results obtained by clustering algorithms is a fundamental part of the clustering process. The most used approaches for cluster validation are based on internal cluster validity indices. Although many indices have been proposed, there is no recent extensive comparative study of their performance. In this paper we show the results of an experimental work that compares 30 cluster validity indices in many different environments with different characteristics. These results can serve as a guideline for selecting the most suitable index for each possible application and provide a deep insight into the performance differences between the currently available indices.

[1]  Edward R. Dougherty,et al.  Model-based evaluation of clustering validation measures , 2007, Pattern Recognit..

[2]  Ian F. C. Smith,et al.  A Bounded Index for Cluster Validity , 2007, MLDM.

[3]  Sanghamitra Bandyopadhyay,et al.  Performance Evaluation of Some Symmetry-Based Cluster Validity Indexes , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Nikhil R. Pal,et al.  Cluster validation using graph theoretic concepts , 1997, Pattern Recognit..

[5]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[6]  K. alik,et al.  Validity index for clusters of different sizes and densities , 2011 .

[7]  Borut Zalik,et al.  Validity index for clusters of different sizes and densities , 2011, Pattern Recognit. Lett..

[8]  David M. W. Powers,et al.  Characterization and evaluation of similarity measures for pairs of clusterings , 2009, Knowledge and Information Systems.

[9]  Minho Kim,et al.  New indices for cluster validity assessment , 2005, Pattern Recognit. Lett..

[10]  A. Hardy On the number of clusters , 1996 .

[11]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[12]  V. Batagelj,et al.  Comparing resemblance measures , 1995 .

[13]  M. Cugmas,et al.  On comparing partitions , 2015 .

[14]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[15]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[16]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[17]  Iñaki Albisua,et al.  SEP/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index , 2010, Pattern Recognit..

[18]  L. Hubert,et al.  Measuring the Power of Hierarchical Cluster Analysis , 1975 .

[19]  R. Darlington,et al.  Factor Analysis , 2008 .

[20]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[21]  S. Dolnicar,et al.  An examination of indexes for determining the number of clusters in binary data sets , 2002, Psychometrika.

[22]  Catherine A. Sugar,et al.  Finding the Number of Clusters in a Dataset , 2003 .

[23]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[24]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[25]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[26]  Luis F. Lago-Fernández,et al.  Normality-based validation for crisp clustering , 2010, Pattern Recognit..

[27]  L. Hubert,et al.  A general statistical framework for assessing categorical clustering in free recall. , 1976 .

[28]  Anil K. Jain,et al.  Bootstrap technique in cluster analysis , 1987, Pattern Recognit..

[29]  Sushil Jajodia,et al.  Applications of Data Mining in Computer Security , 2002, Advances in Information Security.

[30]  Michalis Vazirgiannis,et al.  Clustering validity assessment: finding the optimal partitioning of a data set , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[31]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[32]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[34]  Olatz Arbelaitz,et al.  Towards a standard methodology to evaluate internal cluster validity indices , 2011, Pattern Recognit. Lett..

[35]  Michalis Vazirgiannis,et al.  A density-based cluster validity approach using multi-representatives , 2008, Pattern Recognit. Lett..

[36]  RICHARD C. DUBES,et al.  How many clusters are best? - An experiment , 1987, Pattern Recognit..

[37]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[38]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Dong-Jo Park,et al.  A Novel Validity Index for Determination of the Optimal Number of Clusters , 2001 .

[40]  James C. Bezdek,et al.  A geometric approach to cluster validity for normal mixtures , 1997, Soft Comput..

[42]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[43]  Isabelle Guyon,et al.  Clustering: Science or Art? , 2009, ICML Unsupervised and Transfer Learning.

[44]  Boris Mirkin,et al.  Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .

[45]  Sanghamitra Bandyopadhyay,et al.  A Point Symmetry-Based Clustering Technique for Automatic Evolution of Clusters , 2008, IEEE Transactions on Knowledge and Data Engineering.

[46]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[47]  M.-C. Su,et al.  A new cluster validity measure and its application to image compression , 2004, Pattern Analysis and Applications.