The Effects of Within-group Covariance Structure on Recovery in Cluster Analysis: I. The Bivariate Case.

In a preliminary investigation, two Monte Carlo studies investigated the effects of within-group covariance structure on subgroup recovery by I0 hierarchical clustering methods. Data sets were 100 bivariate observations from two subgroups. Study 1 manipulated subgroup size, within-group correlation, within-group variance, and distance between centroids. Negative within-group correlation yielded much poorer recovery for all clustering methods. In addition, clustering method interacted with within-group variance. Study 2 manipulated subgroup size, within-group correlation, direction of the vector separating subgroup centroids, and distance between subgroup centroids. Superior recovery was associated with within-group correlation that matched the direction of subgroup separation. Results are interpreted according to the weakness of Euclidean distance as a measure of (dis)similarity.

[1]  R M Dreger,et al.  Clustering Seven Data Sets by Means of Some or All of Seven Clustering Methods. , 1988, Multivariate behavioral research.

[2]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[3]  J. Overall NOTE ON MULTIVARIATE METHODS FOR PROFILE ANALYSIS. , 1964, Psychological bulletin.

[4]  Roger K. Blashfield,et al.  Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods. , 1976 .

[5]  G. W. Milligan,et al.  A monte carlo study of thirty internal criterion measures for cluster analysis , 1981 .

[6]  L. Hubert,et al.  Comparing partitions , 1985 .

[7]  G. W. Milligan,et al.  A Study of the Beta-Flexible Clustering Method. , 1989, Multivariate behavioral research.

[8]  G. W. Milligan,et al.  Methodology Review: Clustering Methods , 1987 .

[9]  G. Milligan Ultrametric hierarchical clustering algorithms , 1979 .

[10]  D Scheibler,et al.  Monte Carlo Tests of the Accuracy of Cluster Analysis Algorithms: A Comparison of Hierarchical and Nonhierarchical Methods. , 1985, Multivariate behavioral research.

[11]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[12]  E. Heermann COMMENTS ON OVERALL'S "MULTIVARIATE METHODS FOR PROFILE ANALYSIS". , 1965, Psychological bulletin.

[13]  G. W. Milligan,et al.  A Comparison of Two Approaches to Beta-Flexible Clustering. , 1992, Multivariate behavioral research.

[14]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[15]  Minoru Siotani,et al.  3 Large sample approximations and asymptotic expansions of classification statistics , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[16]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[17]  H. Solomon,et al.  Taxonomy and Behavioral Science: Comparative Performance of Grouping Methods , 1981 .

[18]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[19]  G. Soete OVWTRE: A program for optimal variable weighting for ultrametric and additive tree fitting , 1988 .

[20]  James C. Anderson,et al.  The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis , 1984 .

[21]  N. Cliff Dominance statistics: Ordinal analyses to answer ordinal questions. , 1993 .

[22]  G. W. Milligan,et al.  A validation study of a variable weighting algorithm for cluster analysis , 1989 .

[23]  J. Breckenridge Replicating Cluster Analysis: Method, Consistency, and Validity. , 1989, Multivariate behavioral research.

[24]  James C. Anderson,et al.  The Effects of Sampling Error and Model Characteristics on Parameter Estimation for Maximum Likelihood Confirmatory Factor Analysis. , 1985, Multivariate behavioral research.

[25]  L. Cronbach,et al.  Assessing similarity between profiles. , 1953, Psychological bulletin.

[26]  G. W. Milligan,et al.  A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. , 1986, Multivariate behavioral research.

[27]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[28]  Leslie C. Morey,et al.  A Comparison of Four Clustering Methods Using MMPI Monte Carlo Data , 1980 .

[29]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[30]  G. W. Milligan,et al.  An algorithm for generating artificial test clusters , 1985 .

[31]  Heermann Ef COMMENTS ON OVERALL'S "MULTIVARIATE METHODS FOR PROFILE ANALYSIS". , 1965 .

[32]  G. N. Lance,et al.  A general theory of classificatory sorting strategies: II. Clustering systems , 1967, Comput. J..

[33]  G. Soete Optimal variable weighting for ultrametric and additive tree clustering , 1986 .