Asymmetric binary similarity measures

SummaryAsymmetry in binary data arises when one of the two states (e.g. state “1”) is interpreted as more informative than the other state. A common example in ecology occurs when one state represents presence of some unit and the other state represents absence. The problem of the classification of individuals based upon a set of such characters is related to the goal of group homogeneity. The homogeneity of a group of individuals is defined as the count over all possible pairs of individuals and all characters, of the number of shared 1 states, minus the number of mismatches or 0–1, 1-0 combinations. The shared 0 states are effectively neutral, then, in terms of 1-state homogeneity.The behaviour of some common binary similarity measures is examined in relation to 1-state homogeneity. Although the Jaccard coefficient comes close to having the desired behaviour it exhibits undesirable behaviour for some data values and a proportionality relationship between matches and mismatches that may not always be desirable. A new coefficient, “C”, is introduced which overcomes these problems and leads to homogeneous classifications in the sense described above. Further general recommendations are made for the use of these coefficients in various contexts.