SPLIT SELECTION METHODS FOR CLASSIFICATION TREES

Classification trees based on exhaustive search algorithms tend to be bi- ased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares similarities with the FACT method, but it yields binary splits and the final tree can be selected by a direct stopping rule or by pruning. Real and simulated data are used to compare QUEST with the exhaustive search approach. QUEST is shown to be substantially faster and the size and classification accuracy of its trees are typically comparable to those of exhaustive search. A classification tree is a rule for predicting the class of an object from the values of its predictor variables. The tree is constructed by recursively parti- tioning a learning sample of data in which the class label and the values of the predictor variables for each case are known. Each partition is represented by a node in the tree. Two approaches to split selection have been proposed in the statistical liter- ature. The first and more popular approach examines all possible binary splits of the data along each predictor variable to select the split that most reduces some measure of node impurity. It is used, for example, by the THAID (Morgan and Sonquist (1963), Morgan and Messenger (1973)) and CART (Breiman, Friedman, Olshen and Stone (1984)) algorithms. If X is an ordered variable, this approach searches over all possible values c for splits of the form X ≤ c. (1)

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[3]  H. Levene Robust tests for equality of variances , 1961 .

[4]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[5]  Peter Doyle,et al.  The Use of Automatic Interaction Detector and Similar Search Procedures , 1973 .

[6]  J. Morgan,et al.  Thaid a Sequential Analysis Program for the Analysis of Nominal Scale Dependent Variables , 1973 .

[7]  Ramanathan Gnanadesikan,et al.  Methods for statistical data analysis of multivariate observations , 1977, A Wiley publication in applied statistics.

[8]  J. D. T. Oliveira,et al.  The Asymptotic Theory of Extreme Order Statistics , 1979 .

[9]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[10]  W. Loh,et al.  Tree-Structured Classification Via Generalized Discriminant Analysis: Rejoinder , 1988 .

[11]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .

[12]  Leo Breiman,et al.  Tree-Structured Classification Via Generalized Discriminant Analysis: Comment , 1988 .

[13]  J. Angus The Asymptotic Theory of Extreme Order Statistics , 1990 .

[14]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  W. Loh,et al.  Tree-structured proportional hazards regression modeling. , 1994, Biometrics.

[16]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[17]  P. Chaudhuri,et al.  Piecewise polynomial regression trees , 1994 .

[18]  W. Loh,et al.  Generalized regression trees , 1995 .

[19]  R. Mike Cameron-Jones,et al.  Oversearching and Layered Search in Empirical Learning , 1995, IJCAI.

[20]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[21]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[22]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[23]  Statistica Sinica , .