A monte carlo study of thirty internal criterion measures for cluster analysis

A Monte Carlo evaluation of thirty internal criterion measures for cluster analysis was conducted. Artificial data sets were constructed with clusters which exhibited the properties of internal cohesion and external isolation. The data sets were analyzed by four hierarchical clustering methods. The resulting values of the internal criteria were compared with two external criterion indices which determined the degree of recovery of correct cluster structure by the algorithms. The results indicated that a subset of internal criterion measures could be identified which appear to be valid indices of correct cluster recovery. Indices from this subset could form the basis of a permutation test for the existence of cluster structure or a clustering algorithm.

[1]  J. Guilford Fundamental statistics in psychology and education , 1943 .

[2]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[3]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[4]  F. Rohlf,et al.  Tests for Hierarchical Structure in Random Data Sets , 1968 .

[5]  R. M. Cormack,et al.  A Review of Classification , 1971 .

[6]  G. N. Lance,et al.  Group-Size Depencence: A Rationale for Choice Between Numerical Classifications , 1971, Comput. J..

[7]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[8]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[9]  Peter W Lewis,et al.  Naval Postgraduate School Random Number Generator Package LLRANDOM , 1973 .

[10]  F. Rohlf Methods of Comparing Classifications , 1974 .

[11]  L. Hubert,et al.  Measuring the Power of Hierarchical Cluster Analysis , 1975 .

[12]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[13]  Roger K. Blashfield,et al.  Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods. , 1976 .

[14]  Edward J. Dudewicz Computer Programs: Speed and Quality of Random Numbers for Simulation , 1976 .

[15]  L. Hubert,et al.  A general statistical framework for assessing categorical clustering in free recall. , 1976 .

[16]  P. Sneath,et al.  Basic program for a significance test for two clusters in euclidean space as measured by their overlap , 1979 .

[17]  C. Edelbrock Mixture Model Tests Of Hierarchical Clustering Algorithms: The Problem Of Classifying Everybody. , 1979, Multivariate behavioral research.

[18]  S. Arnold A Test for Clusters , 1979 .

[19]  P. Sneath BASIC program for a significance test for clusters in UPGMA dendrograms obtained from squared Euclidean distances , 1979 .

[20]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[21]  G. W. Milligan,et al.  The validation of four ultrametric clustering algorithms , 1980, Pattern Recognit..