Towards a standard methodology to evaluate internal cluster validity indices

The evaluation and comparison of internal cluster validity indices is a critical problem in the clustering area. The methodology used in most of the evaluations assumes that the clustering algorithms work correctly. We propose an alternative methodology that does not make this often false assumption. We compared 7 internal cluster validity indices with both methodologies and concluded that the results obtained with the proposed methodology are more representative of the actual capabilities of the compared indices.

[1]  Sushil Jajodia,et al.  Applications of Data Mining in Computer Security , 2002, Advances in Information Security.

[2]  F. Rohlf Methods of Comparing Classifications , 1974 .

[3]  Ahmed Albatineh,et al.  On Similarity Indices and Correction for Chance Agreement , 2006, J. Classif..

[4]  L. Hubert,et al.  Measuring the Power of Hierarchical Cluster Analysis , 1975 .

[5]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  A. Hardy On the number of clusters , 1996 .

[7]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[8]  Ravi Kothari,et al.  On finding the number of clusters , 1999, Pattern Recognit. Lett..

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[11]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[12]  Arnaud Devillez,et al.  A fuzzy hybrid hierarchical clustering method with a new criterion able to find the optimal partition , 2002, Fuzzy Sets Syst..

[13]  M.-C. Su,et al.  A new cluster validity measure and its application to image compression , 2004, Pattern Analysis and Applications.

[14]  Horst Bunke,et al.  Validation indices for graph clustering , 2003, Pattern Recognit. Lett..

[15]  Nikhil R. Pal,et al.  Cluster validation using graph theoretic concepts , 1997, Pattern Recognit..

[16]  Minho Kim,et al.  New indices for cluster validity assessment , 2005, Pattern Recognit. Lett..

[17]  David M. W. Powers,et al.  Characterization and evaluation of similarity measures for pairs of clusterings , 2009, Knowledge and Information Systems.

[18]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[19]  S. Dolnicar,et al.  An examination of indexes for determining the number of clusters in binary data sets , 2002, Psychometrika.

[20]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[21]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[22]  V. Batagelj,et al.  Comparing resemblance measures , 1995 .

[23]  L. Hubert,et al.  A general statistical framework for assessing categorical clustering in free recall. , 1976 .

[24]  F. B. Baulieu A classification of presence/absence based dissimilarity coefficients , 1989 .

[25]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Alan Agresti,et al.  The Measurement of Classification Agreement: An Adjustment to the Rand Statistic for Chance Agreement , 1984 .

[27]  Shengrui Wang,et al.  An objective approach to cluster validation , 2006, Pattern Recognit. Lett..

[28]  Boris Mirkin,et al.  Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .