Cluster-based visualisation with scatter matrices

The trace of the scatter matrix, which measures separation between population cohorts, is shown to be strictly preserved by sphering the data followed by a projection onto the space of population means. This result suggests using the space of means as a basis to calculate well-separating lower-dimensional projections of the data, derived from the scatter matrix in the projective space. In particular, it defines an approximation to the canonical decomposition of the scatter matrix that applies for singular covariance matrices. The method is illustrated with reference to k-means clusters in data sets from bioinformatics and marketing.