Sum of Squares Decomposition for Categorical Data

The decomposition of the sum of squares (SS) to a within-group sum of squares (WSS) and between-group sum of squares (BSS) is often employed as the criterion to judge the quality of clusters. However, the method is applicable only to continuous data. The aim of this paper is to present an extension of this criterion to categorical data. We employ Gini’s definition of the SS for a single categorical variable. The definition is extended to the multivariate case, and its decomposition is conducted in a reasonable manner. The results presented gives a theoretical foundation to the selection of discrimination and characteristic rules from a lattice.