The instability problem of decision tree classification algorithms is that small changes in input training samples may cause dramatically large changes in output classification rules. Different rules generated from almost the same training samples are against human intuition and complicate the process of decision making. In this paper, we present fundamental theorems for the instability problem of decision tree classifiers. The first theorem gives the relationship between a data change and the resulting tree structure change (i.e. split change). The second theorem, Instability Theorem, provides the cause of the instability problem. Based on the two theorems, algorithmic improvements can be made to lessen the instability problem. Empirical results illustrate the theorem statements. The trees constructed by the proposed algorithm are more stable, noise-tolerant, informative, expressive, and concise. Our proposed sensitivity measure can be used as a metric to evaluate the stability of splitting predicates. The tree sensitivity is an indicator of the confidence level in rules and the effective lifetime of rules.
[1]
Johannes Gehrke,et al.
BOAT—optimistic decision tree construction
,
1999,
SIGMOD '99.
[2]
J. Ross Quinlan,et al.
Induction of Decision Trees
,
1986,
Machine Learning.
[3]
Salvatore J. Stolfo,et al.
On the Accuracy of Meta-learning for Scalable Data Mining
,
2004,
Journal of Intelligent Information Systems.
[4]
Leo Breiman,et al.
Bagging Predictors
,
1996,
Machine Learning.
[5]
J. Friedman.
Special Invited Paper-Additive logistic regression: A statistical view of boosting
,
2000
.
[6]
Pedro M. Domingos.
Knowledge Discovery Via Multiple Models
,
1998,
Intell. Data Anal..
[7]
Yasuhiko Morimoto,et al.
Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules
,
1996,
VLDB.
[8]
Jonathan J. Oliver.
Decision Graphs - An Extension of Decision Trees
,
1993
.