On Similarity Indices and Correction for Chance Agreement

Similarity indices can be used to compare partitions (clusterings) of a data set. Many such indices were introduced in the literature over the years. We are showing that out of 28 indices we were able to track, there are 22 different ones. Even though their values differ for the same clusterings compared, after correcting for agreement attributed to chance only, their values become similar and some of them even become equivalent. Consequently, the problem of choice of the index to be used for comparing different clusterings becomes less important.

[1]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[2]  L. Hubert,et al.  Comparing partitions , 1985 .

[3]  David L. Wallace,et al.  A Method for Comparing Two Hierarchical Clusterings: Comment , 1983 .

[4]  Ahrned Najeeb Khalaf Albatineh On Similarity Measures for Cluster Analysis , 2004 .

[5]  P. C. Saxena,et al.  The effect of cluster size, dimensionality, and number of clusters on recovery of true cluster structure through Chernoff-type faces , 1991 .

[6]  F. B. Baulieu A classification of presence/absence based dissimilarity coefficients , 1989 .

[7]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[8]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[9]  Alan Agresti,et al.  The Measurement of Classification Agreement: An Adjustment to the Rand Statistic for Chance Agreement , 1984 .

[10]  Ian T. Jolliffe,et al.  A Method for Comparing Two Hierarchical Clusterings: Comment , 1983 .

[11]  Lawrence Hubert Nominal scale response agreement as a generalized correlation , 1977 .

[12]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[13]  C S Peirce,et al.  The numerical measure of the success of predictions. , 1884, Science.

[14]  G. W. Milligan,et al.  A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. , 1986, Multivariate behavioral research.

[15]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[16]  G. W. Milligan,et al.  The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Brian Everitt,et al.  Cluster analysis , 1974 .

[18]  J. Gower,et al.  Metric and Euclidean properties of dissimilarity coefficients , 1986 .

[19]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[20]  P. C. Saxena,et al.  Comparison of Chernoff-type face and non-graphical methods for clustering multivariate observations , 1993 .

[21]  A. Ochiai Zoogeographical Studies on the Soleoid Fishes Found in Japan and its Neighbouring Regions-III , 1957 .

[22]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[23]  G. Udny Yule,et al.  On Reading a Scale , 1927 .

[24]  P. F. Russell,et al.  On Habitat and Association of Species of Anopheline Larvae in South-eastern Madras. , 1940 .

[25]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[26]  D J Rogers,et al.  A Computer Program for Classifying Plants. , 1960, Science.

[27]  E W Fager,et al.  Zooplankton Species Groups in the North Pacific: Co-occurrences of species can be used to derive groups whose members react similarly to water-mass types. , 1963, Science.