Methodology Review: Clustering Methods

A review of clustering methodology is presented, with emphasis on algorithm performance and the re sulting implications for applied research. After an over view of the clustering literature, the clustering process is discussed within a seven-step framework. The four major types of clustering methods can be characterized as hierarchical, partitioning, overlapping, and ordina tion algorithms. The validation of such algorithms re fers to the problem of determining the ability of the methods to recover cluster configurations which are known to exist in the data. Validation approaches in clude mathematical derivations, analyses of empirical datasets, and monte carlo simulation methods. Next, interpretation and inference procedures in cluster anal ysis are discussed. inference procedures involve test ing for significant cluster structure and the problem of determining the number of clusters in the data. The paper concludes with two sets of recommendations. One set deals with topics in clustering that would ben efit from continued research into the methodology. The other set offers recommendations for applied anal yses within the framework of the clustering process.

[1]  Roger K. Blashfield,et al.  Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods. , 1976 .

[2]  F. Rohlf Methods of Comparing Classifications , 1974 .

[3]  R. Cattell The three basic factor-analytic research designs-their interrelations and derivatives. , 1952, Psychological bulletin.

[4]  J. V. Ness,et al.  Admissible clustering procedures , 1971 .

[5]  Brian Everitt,et al.  Cluster analysis , 1974 .

[6]  L C Morey,et al.  A Comparison of Cluster Analysis Techniques Withing a Sequential Validation Framework. , 1983, Multivariate behavioral research.

[7]  L. Cronbach,et al.  Assessing similarity between profiles. , 1953, Psychological bulletin.

[8]  W. DeSarbo,et al.  Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm , 1985 .

[9]  Steven G. Goldstein,et al.  A COMPARISON OF MULTIVARIATE GROUPING TECHNIQUES COMMONLY USED WITH PROFILE DATA , 1969 .

[10]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.

[11]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[12]  R. Hanka,et al.  The scientific use of factor analysis: Raymond B. Cattell Plenum Press, £20.48 , 1981 .

[13]  Kerry L Lee,et al.  Multivariate Tests for Clusters , 1979 .

[14]  A. Tversky,et al.  Extended similarity trees , 1986 .

[15]  Irving John Good C129. An index of separateness of clusters and a permutation test for its statistical significance , 1982 .

[16]  Leslie C. Morey,et al.  A Comparison of Four Clustering Methods Using MMPI Monte Carlo Data , 1980 .

[17]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[18]  J. B. Kruskal,et al.  Icicle Plots: Better Displays for Hierarchical Clustering , 1983 .

[19]  Richard C. Dubes,et al.  Cluster validity profiles , 1982, Pattern Recognit..

[20]  Anil K. Jain,et al.  Validity studies in clustering methodologies , 1979, Pattern Recognit..

[21]  R. Jancey Multidimensional group analysis , 1966 .

[22]  G. W. Milligan,et al.  A Review Of Monte Carlo Tests Of Cluster Analysis. , 1981, Multivariate behavioral research.

[23]  R. Blashfield,et al.  A Nearest-Centroid Technique for Evaluating the Minimum-Variance Clustering Procedure. , 1980 .

[24]  R C Durfee,et al.  A METHOD OF CLUSTER ANALYSIS. , 1970, Multivariate behavioral research.

[25]  J. Hartigan Statistical theory in clustering , 1985 .

[26]  Mezzich Je Evaluating clustering methods for psychiatric diagnosis. , 1978 .

[27]  Kazumasa Ozawa,et al.  A stratificational overlapping cluster scheme , 1985, Pattern Recognit..

[28]  R K Blashfield,et al.  The Growth Of Cluster Analysis: Tryon, Ward, And Johnson. , 1980, Multivariate behavioral research.

[29]  G. W. Milligan,et al.  A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. , 1986, Multivariate behavioral research.

[30]  John C. Ogilvie,et al.  Evaluation of hierarchical grouping techniques; a preliminary study , 1972, Comput. J..

[31]  C. Edelbrock Mixture Model Tests Of Hierarchical Clustering Algorithms: The Problem Of Classifying Everybody. , 1979, Multivariate behavioral research.

[32]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[33]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[34]  G. W. Milligan,et al.  An algorithm for generating artificial test clusters , 1985 .

[35]  D. Matula Graph Theoretic Techniques for Cluster Analysis Algorithms , 1977 .

[36]  G. W. Milligan,et al.  A NOTE ON PROCEDURES FOR TESTING THE QUALITY OF A CLUSTERING OF A SET OF OBJECTS , 1980 .

[37]  L. Hubert,et al.  Measuring the Power of Hierarchical Cluster Analysis , 1975 .

[38]  F. Baker Stability of Two Hierarchical Grouping Techniques Case I: Sensitivity to Data Errors , 1974 .

[39]  R. Mojena,et al.  Hierarchical Grouping Methods and Stopping Rules: An Evaluation , 1977, Comput. J..

[40]  Malcolm E. Turner,et al.  CREDIBILITY AND CLUSTER , 1969 .

[41]  B. Everitt A Monte Carlo Investigation Of The Likelihood Ratio Test For The Number Of Components In A Mixture Of Normal Distributions. , 1981, Multivariate behavioral research.

[42]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[43]  S. Arnold A Test for Clusters , 1979 .

[44]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[45]  William H. E. Day,et al.  Foreword: Comparison and consensus of classifications , 1986 .

[46]  Warren S. Sarle,et al.  Cubic Clustering Criterion , 1983 .

[47]  G. W. Milligan,et al.  The validation of four ultrametric clustering algorithms , 1980, Pattern Recognit..

[48]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[49]  J. Hartigan Distribution Problems in Clustering , 1977 .

[50]  G. W. Milligan,et al.  A Two-Stage Clustering Algorithm with Robust Recovery Characteristics , 1980 .

[51]  Shih Chung Soon On detection of extreme data points in cluster analysis , 1987 .

[52]  R. M. Needham,et al.  Automatic Classification in Linguistics , 1967 .

[53]  R. D'Andrade U-statistic hierarchical clustering , 1978 .

[54]  Lawrence Hubert,et al.  The comparison and fitting of given classification schemes , 1977 .

[55]  H. Charles Romesburg,et al.  Cluster analysis for researchers , 1984 .

[56]  D Scheibler,et al.  Monte Carlo Tests of the Accuracy of Cluster Analysis Algorithms: A Comparison of Hierarchical and Nonhierarchical Methods. , 1985, Multivariate behavioral research.

[57]  P. Sneath A method for testing the distinctness of clusters: A test of the disjunction of two clusters in Euclidean space as measured by their overlap , 1977 .

[58]  G. W. Milligan,et al.  A monte carlo study of thirty internal criterion measures for cluster analysis , 1981 .

[59]  C. Edelbrock,et al.  Hierarchical Cluster Analysis Using Intraclass Correlations: A Mixture Model Study. , 1980, Multivariate behavioral research.

[60]  M. A. Wong,et al.  A Hybrid Clustering Method for Identifying High-Density Clusters , 1982 .

[61]  B. Everitt Unresolved Problems in Cluster Analysis , 1979 .

[62]  E. R. Peay Nonmetric grouping: Clusters and cliques , 1975 .

[63]  A. D. Gordon A Review of Hierarchical Classification , 1987 .

[64]  Joseph L. Fleiss,et al.  On the use of inverted factor analysis for generating typologies. , 1971 .

[65]  J Zubin,et al.  ON THE METHODS AND THEORY OF CLUSTERING. , 1969, Multivariate behavioral research.

[66]  Charles K. Bayne,et al.  Monte Carlo comparisons of selected clustering procedures , 1980, Pattern Recognit..

[67]  Robert L. Kaufman Issues in Multivariate Cluster Analysis , 1985 .

[68]  Harvey A. Skinner,et al.  Differentiating the Contribution of Elevation, Scatter and Shape in Profile Similarity , 1978 .

[69]  L. Fisher,et al.  391: A Monte Carlo Comparison of Six Clustering Procedures , 1975 .

[70]  R. M. Cormack,et al.  A Review of Classification , 1971 .

[71]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[72]  Girish N. Punj,et al.  Cluster Analysis in Marketing Research: Review and Suggestions for Application , 1983 .

[73]  L. Hubert Some applications of graph theory to clustering , 1974 .

[74]  H. Bock On some significance tests in cluster analysis , 1985 .

[75]  J. Hartigan Asymptotic Distributions for Clustering Criteria , 1978 .

[76]  Validity Studies , 1979 .

[77]  K. R. Harrigan An Application of Clustering for Strategic Group Analysis , 1985 .

[78]  J. Hartigan,et al.  Representing Points in Many Dimensions by Trees and Castles , 1981 .

[79]  V. E. Kane,et al.  Estimating the number of groups and group membership using simulation cluster analysis , 1982, Pattern Recognit..

[80]  Geoffrey H. Ball,et al.  ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION , 1965 .

[81]  R K Blashfield,et al.  The Literature On Cluster Analysis. , 1978, Multivariate behavioral research.

[82]  Roger K. Blashfield The equivalence of three statistical packages for performing hierarchical cluster analysis , 1977 .

[83]  R. F. Ling A Probability Theory of Cluster Analysis , 1973 .

[84]  B. Everitt,et al.  Cluster Analysis (2nd ed). , 1982 .

[85]  A. Scott,et al.  Clustering methods based on likelihood ratio criteria. , 1971 .

[86]  J. Mezzich Evaluating clustering methods for psychiatric diagnosis. , 1978, Biological psychiatry.

[87]  Roger N. Shepard,et al.  Additive clustering: Representation of similarities as combinations of discrete overlapping properties. , 1979 .

[88]  G. N. Lance,et al.  Controversy Concerning the Criteria for Taxonometric Strategies , 1971, Computer/law journal.

[89]  Anil K. Jain,et al.  Clustering Methodologies in Exploratory Data Analysis , 1980, Adv. Comput..

[90]  J. Gower A comparison of some methods of cluster analysis. , 1967, Biometrics.

[91]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[92]  F. Marriott Practical problems in a method of cluster analysis. , 1971, Biometrics.

[93]  G. W. Milligan,et al.  A Study of the Beta-Flexible Clustering Method. , 1989, Multivariate behavioral research.

[94]  G. Milligan Ultrametric hierarchical clustering algorithms , 1979 .

[95]  G. N. Lance,et al.  A general theory of classificatory sorting strategies: II. Clustering systems , 1967, Comput. J..

[96]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[97]  A L Gross A MONTE CARLO STUDY OF THE ACCURACY OF A HIERARCHICAL GROUPING PROCEDURE. , 1972, Multivariate behavioral research.