The analysis of distance of grouped data with categorical variables: Categorical canonical variate analysis

We use generalised biplots to develop the important special case of (i) when all variables are categorical and (ii) the samples fall into K recognised groups. We term this Categorical Canonical Variate Analysis (CatCVA), because it has similar characteristics to Rao's Canonical Variate Analysis (CVA), especially its visual aspects. It allows centroids of groups to be exhibited in increasing numbers of dimensions, together with information on within-group sample variation. Variables are represented by category-level-points (CLPs) which are a counterpart of numerically calibrated biplot axes for quantitative variables. Mechanisms are provided for relating the samples to their category levels, for giving convex regions to help predict categories, and for adding new samples. Inter-sample distance may be measured by any Euclidean embeddable distance. Computation is minimised by working in the K - 1 dimensional space containing the group centroids.The methodology is illustrated by an example with three groups and 37 samples but the number of samples size is not a serious limitation. The visualisation of group structure is the main focus of this paper; computational efficiency is a bonus.