论文信息 - FBP: A Frontier-Based Tree-Pruning Algorithm

FBP: A Frontier-Based Tree-Pruning Algorithm

A frontier-based tree-pruning algorithm (FBP) is proposed. The new method has an order of computational complexity comparable to cost-complexity pruning (CCP). Regarding tree pruning, it provides a full spectrum of information: namely, (1) given the value of the penalization parameter λ, it gives the decision tree specified by the complexity-penalization approach; (2) given the size of a decision tree, it provides the range of the penalization parameter λ, within which the complexity-penalization approach renders this tree size; (3) it finds the tree sizes that are inadmissible---no matter what the value of the penalty parameter is, the resulting tree based on a complexity-penalization framework will never have these sizes. Simulations on real data sets reveal a “surprise:” in the complexity-penalization approach, most of the tree sizes are inadmissible. FBP facilitates a more faithful implementation of cross validation (CV), which is favored by simulations. Using FBP, a stability analysis of CV is proposed.

[1] D. Donoho. CART AND BEST-ORTHO-BASIS: A CONNECTION' , 1997 .

[2] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[3] Christopher J. Merz,et al. UCI Repository of Machine Learning Databases , 1996 .

[4] Ronald R. Coifman,et al. Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[5] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[6] James T. C. Teng,et al. A Dynamic Programming Based Pruning Method for Decision Trees , 2001, INFORMS J. Comput..

[7] David L. Donoho,et al. WaveLab and Reproducible Research , 1995 .

[8] Andreas Buja,et al. Data mining criteria for tree-based regression and classification , 2001, KDD '01.

[9] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[10] D. Donoho. Wedgelets: nearly minimax estimation of edges , 1999 .