Internal versus External cluster validation indexes

One of fundamental challenges of clustering is how to evaluate results, without auxiliary information. A common approach for evaluation of clustering results is to use validity indexes. Clustering validity approaches can use three criteria: External criteria (evaluate the result with respect to a pre-specified structure), internal criteria (evaluate the result with respect a information intrinsic to the data alone). Consequently, different types of indexes are used to solve different types of problems and indexes selection depends on the kind of available information. That is why in this paper we show a comparison between external and internal indexes. Results obtained in this study indicate that internal indexes are more accurate in group determining in a given clustering structure. Six internal indexes were used in this study: BIC, CH, DB, SIL, NIVA and DUNN and four external indexes (F-measure, NMIMeasure, Entropy, Purity). The clusters that were used were obtained through clustering algorithms K-means and Bissecting-K- means.

[1]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[4]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[5]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[6]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  A. Raftery A Note on Bayes Factors for Log‐Linear Contingency Table Models with Vague Prior Information , 1986 .

[8]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[9]  Nikhil R. Pal,et al.  Cluster validation using graph theoretic concepts , 1997, Pattern Recognit..

[10]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[11]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[13]  Michalis Vazirgiannis,et al.  Quality Scheme Assessment in the Clustering Process , 2000, PKDD.

[14]  Chien-Hsing Chou,et al.  Short Papers , 2001 .

[15]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[16]  Chien-Hsing Chou,et al.  Symmetry as A new Measure for Cluster Validity , 2002 .

[17]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[18]  Athena Vakali,et al.  An Overview of Web Data Clustering Practices , 2004, EDBT Workshops.

[19]  S Miyano,et al.  Open source clustering software. , 2004, Bioinformatics.

[20]  Bernd Drewes Some Industrial Applications of Text Mining , 2005 .

[21]  Csaba Legány,et al.  Cluster validity measurement techniques , 2006 .

[22]  Shengrui Wang,et al.  An objective approach to cluster validation , 2006, Pattern Recognit. Lett..

[23]  Ferenc Kovács,et al.  Cluster validity measurement for arbitrary shaped clusters , 2006 .

[24]  Filiberto Pla,et al.  Cluster validation using information stability measures , 2010, Pattern Recognit. Lett..

[25]  International Journal of Computers and Communications , .