Quantified coefficients of association and measurement of similarity

Coefficients of association have been widely employed in cluster analysis. However, their use has been, for the most part, restricted to binary data. This limitation can be overcome by redefining positive and negative matches and mismatches in terms of minimum and maximum values of paired elements of parallel vector arrays. Rewriting the algorithms of coefficients of association with these new components gives the new “quantified” coefficients general utility for binary, ordered multistate, and quantitative data, while retaining their original analytic properties. Quantified coefficients of association avoid several problems of shape and size that are associated with correlation coefficients and measures of Euclidean distance. However, when measuring similarity, quantified coefficients weight each attribute of an object by that attribute's magnitude. A related set of similarity indices termed “mean ratios” is introduced; these indices give each attribute equal weight in all situations. Both quantified coefficients of association and mean ratios are related to a number of measures of similarity introduced to various fields of scientific research during the past 50 years. A review of this literature is included in an attempt to consolidate methodology and simplify nomenclature.

[1]  David C. Eades,et al.  The Inappropriateness of the Correlation Coefficient as a Measure of Taxonomic Resemblance , 1965 .

[2]  D. H. Colless An Examination of Certain Concepts in Phenetic Taxonomy , 1967 .

[3]  H. B. Lück,et al.  Recherches de bionomie structurale au niveau d'un peuplement benthique sciaphile , 1972 .

[4]  G. Bonham-Carter A numerical method of classification using qualitative and semi-quantitative data, as applied to the facies analysis of limestones , 1965 .

[5]  Mathematical Formulas Expressing Faunal Resemblance , 1963 .

[6]  D. W. Goodall,et al.  A Probabilistic Similarity Index , 1964, Nature.

[7]  Stephen Alfred Forbes,et al.  On the Local Distribution of Certain Illinois Fishes: An Essay in Statistical Ecology , 1907 .

[8]  D. W. Goodall A New Similarity Index Based on Probability , 1966 .

[9]  J. G. Sheals The application of computer techniques to Acarine taxonomy: a preliminary examination with species of the Hypoaspis‐Androlaelaps complex (Acarina) , 1965 .

[10]  P. Jaccard Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines , 1901 .

[11]  W. T. WILLIAMS,et al.  Logic of Computer-Based Intrinsic Classifications , 1965, Nature.

[12]  James F. Mello,et al.  An application of cluster analysis as a method of determining biofacies , 1968 .

[13]  J. Imbrie,et al.  Classification of Modern Bahamian Carbonate Sediments , 1962 .

[14]  T. P. Burnaby On a method for character weighting a similarity coefficient, employing the concept of information , 1970 .

[15]  J. Hazel,et al.  BINARY (PRESENCE-ABSENCE) SIMILARITY COEFFICIENTS , 1969 .

[16]  W. T. Williams,et al.  Multivariate Methods in Plant Ecology: V. Similarity Analyses and Information-Analysis , 1966 .

[17]  W. T. Williams,et al.  Fundamental Problems in Numerical Taxonomy , 1966 .

[18]  James M. Parks,et al.  Fortran IV program for Q-mode cluster analysis on distance function with printed dendrogram , 1970 .

[19]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[20]  J. T. Curtis,et al.  An Ordination of the Upland Forest Communities of Southern Wisconsin , 1957 .

[21]  E W Fager,et al.  Zooplankton Species Groups in the North Pacific: Co-occurrences of species can be used to derive groups whose members react similarly to water-mass types. , 1963, Science.

[22]  A. J. B. Anderson,et al.  Similarity Measure for Mixed Attribute Types , 1971, Nature.

[23]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[24]  H. Gleason,et al.  Some Applications of the Quadrat Method , 1920 .

[25]  L. C. Cole,et al.  The Measurement of Interspecific Associaton , 1949 .

[26]  W. T. Williams,et al.  Computer Analyses of Petersen's Original Data on Bottom Communities , 1972, Ecological Monographs.

[27]  Jerrold Rubin,et al.  An Approach to Organizing Data into Homogeneous Groups , 1966 .

[28]  J. Hazel Binary Coefficients and Clustering in Biostratigraphy , 1970 .

[29]  E. Odum Bird Populations of the Highlands (North Carolina) Plateau in Relation to Plant Succession and Avian Invasion , 1950 .

[30]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[31]  A. V. Hall,et al.  Avoiding Informational Distortion in Automatic Grouping Programs , 1969 .

[32]  Eli C. Minkoff,et al.  The Effects on Classification of Slight Alterations in Numerical Technique , 1965 .

[33]  James M. Parks,et al.  Cluster Analysis Applied to Multivariate Geologic Problems , 1966, The Journal of Geology.