Tree-Structured Classification via Generalized Discriminant Analysis.

Abstract The problem of constructing classification rules that can be represented as decision trees is considered. Each object to be classified has an associated x vector containing possibly incomplete covariate information. Tree construction is based on the information provided in a “learning sample” of objects with known class identities. The x vectors in the learning sample may have missing values as well. Procedures are proposed for each of the components of classifier construction, such as split selection, tree-size determination, treatment of missing values, and ranking of variables. The main idea is recursive application of linear discriminant analysis, with the variables at each stage being appropriately chosen according to the data and the type of splits desired. Standard statistical techniques used as basic building blocks include analysis of variance, linear and canonical discriminant analysis, and principal component analysis. A new method of tree-structured classification is obtained by assem...