Chance-corrected classification for use in discriminant analysis: Ecological applications

INTRODUCTION Ecologists use discriminant analysis, in part, to examine the correct classification of species or individuals by functional or taxonomic group based on some predictor set of variables (Williams, 1981, 1983). The effectiveness of variable sets of data in discriminating between groups can thus be assessed. A problem, typically encountered in applying discriminant analysis, is unequal group sample sizes. In extreme situations unequal group sizes may lead to a very high percent correct classification but the improvement over random correct classification may be slight. As an example, if one wished to classify individuals of species A and B with sample sizes of 25 and 75, respectively, the probability of correct classification for each group is not 50%. Any individual has an a priori .25 probability of belonging to species A and a .75 probability of belonging to species B. The posterior chance of correct classification will be unclear to a researcher who does not apply a chance-corrected procedure. While a chancecorrected measure of correct prediction is more important as sample sizes become more disparate, such a procedure is useful even with equal group sample sizes. We present an explanation of Cohen's kappa statistic which is useful in interpreting the classification results of discriminant analysis when group sample sizes are equal or unequal. A numerical example is employed in Table 1 taken from Cody (1978). This statistic was developed by Cohen (1960) as a method for objectively computing the chance-corrected percentage of agreement between actual and predicted group memberships. Cohen (1968) later presented a generalized form which was subsequently applied to discriminant analysis in the educational literature by Wiedemann and Fenster (1978).