论文信息 - Pruning Decision Trees and Lists

Pruning Decision Trees and Lists

Machine learning algorithms are techniques that automatically build models describing the structure at the heart of a set of data. Ideally, such models can be used to predict properties of future data points and people can use them to analyze the domain from which the data originates. Decision trees and lists are potentially powerful predictors and embody an explicit representation of the structure in a dataset. Their accuracy and comprehensibility depends on how concisely the learning algorithm can summarize this structure. The final model should not incorporate spurious effects—patterns that are not genuine features of the underlying domain. Given an efficient mechanism for determining when a particular effect is due to chance alone, non-predictive parts of a model can be eliminated or “pruned.” Pruning mechanisms require a sensitive instrument that uses the data to detect whether there is a genuine relationship between the components of a model and the domain. Statistical significance tests are theoretically well-founded tools for doing exactly that. This thesis presents pruning algorithms for decision trees and lists that are based on significance tests. We explain why pruning is often necessary to obtain small and accurate models and show that the performance of standard pruning algorithms can be improved by taking the statistical significance of observations into account. We compare the effect of parametric and non-parametric tests, analyze why current pruning algorithms for decision lists often prune too aggressively, and review related work—in particular existing approaches that use significance tests in the context of pruning. The main outcome of this investigation is a set of simple pruning algorithms that should prove useful in practical data mining applications.

Eibe Frank | Eibe Frank | E. Frank

[1] Ian H. Witten,et al. Using a Permutation Test for Attribute Selection in Decision Trees , 1998, ICML.

[2] Robert Tibshirani,et al. An Introduction to the Bootstrap , 1994 .

[3] D. Wolpert. On Overfitting Avoidance as Bias , 1993 .

[4] Cullen Schaffer. Sparse Data and the Effect of Overfitting Avoidance in Decision Tree Induction , 1992, AAAI.

[5] Wray L. Buntine,et al. A theory of learning classification rules , 1990 .

[6] Brian R. Gaines. An Ounce of Knowledge is Worth a Ton of Data: Quantitative studies of the Trade-Off between Expertise and Data Based On Statistically Well-Founded Empirical Induction , 1989, ML.

[7] James L. McClelland,et al. James L. McClelland, David Rumelhart and the PDP Research Group, Parallel distributed processing: explorations in the microstructure of cognition . Vol. 1. Foundations . Vol. 2. Psychological and biological models . Cambridge MA: M.I.T. Press, 1987. , 1989, Journal of Child Language.

[8] I. K. Sethi,et al. Hierarchical Classifier Design Using Mutual Information , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] G. Kalkanis,et al. The application of confidence interval error analysis to the design of decision tree classifiers , 1993, Pattern Recognit. Lett..

[10] Ronald L. Rivest,et al. Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[11] Norman Cliff,et al. Analyzing Multivariate Data , 1987 .