Comparative evaluation of two superior stopping rules for hierarchical cluster analysis

A split-sample replication stopping rule for hierarchical cluster analysis is compared with the internal criterion previously found superior by Milligan and Cooper (1985) in their comparison of 30 different procedures. The number and extent of overlap of the latent population distributions was systematically varied in the present evaluation of stopping-rule validity. Equal and unequal population base rates were also considered. Both stopping rules correctly identified the actual number of populations when there was essentially no overlap and clusters occupied visually distinct regions of the measurement space. The replication criterion, which is evaluated by clustering of cluster means from preliminary analyses that are accomplished on random partitions of an original data set, was superior as the degree of overlap in population distributions increased. Neither method performed adequately when overlap obliterated visually discernible density nodes.

[1]  G. W. Milligan,et al.  A Study of the Beta-Flexible Clustering Method. , 1989, Multivariate behavioral research.

[2]  C. Edelbrock Mixture Model Tests Of Hierarchical Clustering Algorithms: The Problem Of Classifying Everybody. , 1979, Multivariate behavioral research.

[3]  Brian Everitt,et al.  Cluster analysis , 1974 .

[4]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[5]  Roger K. Blashfield,et al.  Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods. , 1976 .

[6]  M. Aldenderfer Cluster Analysis , 1984 .

[7]  E. A. Haggard,et al.  Intraclass Correlation and the Analysis of Variance , 1958 .

[8]  R. Mojena,et al.  Hierarchical Grouping Methods and Stopping Rules: An Evaluation , 1977, Comput. J..

[9]  John E. Overall,et al.  Replication as a Rule for Determining the Number of Clusters in Hierarchial Cluster Analysis , 1992 .

[10]  Mutsuo M. Yanase,et al.  Fuzziness and Probability , 1985 .

[11]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[12]  R. Blashfield,et al.  A Nearest-Centroid Technique for Evaluating the Minimum-Variance Clustering Procedure. , 1980 .

[13]  J E Overall,et al.  Population recovery capabilities of 35 cluster analysis methods. , 1993, Journal of clinical psychology.

[14]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[15]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[16]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[17]  J. Breckenridge Replicating Cluster Analysis: Method, Consistency, and Validity. , 1989, Multivariate behavioral research.

[18]  J. Overall,et al.  Applied multivariate analysis , 1983 .

[19]  Charles K. Bayne,et al.  Monte Carlo comparisons of selected clustering procedures , 1980, Pattern Recognit..

[20]  Leslie C. Morey,et al.  A Comparison of Four Clustering Methods Using MMPI Monte Carlo Data , 1980 .

[21]  H. Diesenhaus,et al.  Direction Of Measurement And Profile Similarity. , 1967, Multivariate behavioral research.