On Some Invariant Criteria for Grouping Data

Abstract This paper deals with methods of “cluster analysis”. In particular we attack the problem of exploring the structure of multivariate data in search of “clusters”. The approach taken is to use a computer procedure to obtain the “best” partition of n objects into g groups. A number of mathematical criteria for “best” are discussed and related to statistical theory. A procedure for optimizing the criteria is outlined. Some of the criteria are compared with respect to their behavior on actual data. Results of data analysis are presented and discussed.

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  Calyampudi R. Rao,et al.  Advanced Statistical Methods in Biometric Research. , 1953 .

[3]  Walter D. Fisher On a Pooling Problem from the Statistical Decision Viewpoint , 1953 .

[4]  R. Sokal,et al.  A QUANTITATIVE APPROACH TO A PROBLEM IN CLASSIFICATION† , 1957, Evolution; International Journal of Organic Evolution.

[5]  Walter D. Fisher On Grouping for Maximum Homogeneity , 1958 .

[6]  A. Dempster A HIGH DIMENSIONAL TWO SAMPLE SIGNIFICANCE TEST , 1958 .

[7]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[8]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[9]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[10]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[11]  Calyampudi R. Rao The use and interpretation of principal component analysis in applied research , 1964 .

[12]  G. Enderlein Wilks, S. S.: Mathematical Statistics. J. Wiley and Sons, New York–London 1962; 644 S., 98 s , 1964 .

[13]  M. V. Mathews,et al.  Comparisons of Some Statistical Distance Measures for Talker Identification , 1964 .

[14]  R. Bargmann,et al.  Power of the likelihood-ratio test of the general linear hypothesis in multivariate analysis , 1964 .

[15]  G. Rota The Number of Partitions of a Set , 1964 .

[16]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[17]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[18]  W. Torgerson,et al.  Multidimensional scaling of similarity , 1965, Psychometrika.

[19]  A W EDWARDS,et al.  A METHOD FOR CLUSTER ANALYSIS. , 1965, Biometrics.

[20]  M. Schatzoff Sensitivity Comparisons among Tests of the General Linear Hypothesis , 1966 .

[21]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[22]  O. J. Dunn,et al.  Elimination of variates in linear discrimination problems. , 1966, Biometrics.

[23]  J. Rubin Optimal classification into groups: an approach for solving the taxonomy problem. , 1967, Journal of theoretical biology.

[24]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .