Sum of Squares Decomposition for Categorical Data
暂无分享,去创建一个
The decomposition of the sum of squares (SS) to a within-group sum of squares (WSS) and between-group sum of squares (BSS) is often employed as the criterion to judge the quality of clusters. However, the method is applicable only to continuous data. The aim of this paper is to present an extension of this criterion to categorical data. We employ Gini’s definition of the SS for a single categorical variable. The definition is extended to the multivariate case, and its decomposition is conducted in a reasonable manner. The results presented gives a theoretical foundation to the selection of discrimination and characteristic rules from a lattice.
[1] B. Margolin,et al. An Analysis of Variance for Categorical Data , 1971 .
[2] Takashi Okada,et al. Rule Induction in Cascade Model Based on Sum of Squares Decomposition , 1999, PKDD.
[3] Leo Breiman,et al. Classification and Regression Trees , 1984 .
[4] Barry H. Margolin,et al. An Analysis of Variance for Categorical Data, II: Small Sample Comparisons with Chi Square and other Competitors , 1974 .