Analysis of a complexity-based pruning scheme for classification trees

A complexity-based pruning procedure for classification trees is described, and bounds on its finite sample performance are established. The procedure selects a subtree of a (possibly random) initial tree in order to minimize a complexity penalized measure of empirical risk. The complexity assigned to a subtree is proportional to the square root of its size. Two cases are considered. In the first, the growing and pruning data sets are identical, and in the second, they are independent Using the performance bound, the Bayes risk consistency of pruned trees obtained via the procedure is established when the sequence of initial trees satisfies suitable geometric and structural constraints. The pruning method and its analysis are motivated by work on adaptive model selection using complexity regularization.

[1]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[2]  R. Olshen,et al.  Asymptotically Efficient Solutions to the Classification Problem , 1978 .

[3]  G. Lugosi,et al.  Consistency of Data-driven Histogram Methods for Density Estimation and Classification , 1996 .

[4]  Andrew B. Nobel,et al.  Termination and continuity of greedy growing for tree-structured vector quantizers , 1996, IEEE Trans. Inf. Theory.

[5]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[6]  D. Donoho CART AND BEST-ORTHO-BASIS: A CONNECTION' , 1997 .

[7]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[8]  L. Zhao,et al.  Almost Sure $L_r$-Norm Convergence for Data-Based Histogram Density Estimates , 1991 .

[9]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[10]  Antonio Ciampi,et al.  Recursive Partition: A Versatile Method for Exploratory-Data Analysis in Biostatistics , 1987 .

[11]  A. Nobel Histogram regression estimation using data-dependent partitions , 1996 .

[12]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[13]  Edward J. Delp,et al.  An iterative growing and pruning algorithm for classification tree design , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[14]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[15]  R. Olshen,et al.  Consistent nonparametric regression from recursive partitioning schemes , 1980 .

[16]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[17]  G. Lugosi,et al.  Concept learning using complexity regularization , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[18]  Andrew B. Nobel Vanishing distortion and shrinking cells , 1996, IEEE Trans. Inf. Theory.

[19]  Philip A. Chou,et al.  Optimal pruning with applications to tree-structured source coding and modeling , 1989, IEEE Trans. Inf. Theory.

[20]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[21]  Andrew B. Nobel,et al.  Recursive partitioning to reduce distortion , 1997, IEEE Trans. Inf. Theory.

[22]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[23]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[24]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[25]  Philip A. Chou,et al.  Entropy-constrained vector quantization , 1989, IEEE Trans. Acoust. Speech Signal Process..