Generalization Bounds for Decision Trees
暂无分享,去创建一个
The problem of over-fitting is central to both the theory and practice of machine learning. Intuitively, one over-fits by using too many parameters in the concept, e.g, fitting an nth order polynomial ton data points. One under-fits by using too few parameters, e.g., fitting a linear curve to clearly quadr atic data. The fundamental question is how many parameters, or what concept size, should one allow for a given amount of training data. A standard theoretical approach is to prove a bound on generalization error as a function of the training error and the concept size (or VC dimension). One can then select a concept minimizing this bound, i.e., optimizing a certain tradeoff, as expressed in the bound, between traini ng error and concept size. Bounds on generalization error that express a tradeoff between the training error and the size of the concept are often called structural risk minimization (SRM) formulas. A variety of SRM bounds have been proved in the literature [Vap82]. The following SRM bound was proved in [McA98] and, for completeness, is proved again in Section 2. It state s that with probability1 over the sampleS we have the following.
[1] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[2] Yishay Mansour,et al. A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization , 1998, ICML.
[3] Yoav Freund,et al. Self bounding learning algorithms , 1998, COLT' 98.
[4] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[5] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.