Generalization Bounds for Decision Trees

The problem of over-fitting is central to both the theory and practice of machine learning. Intuitively, one over-fits by using too many parameters in the concept, e.g, fitting an nth order polynomial ton data points. One under-fits by using too few parameters, e.g., fitting a linear curve to clearly quadr atic data. The fundamental question is how many parameters, or what concept size, should one allow for a given amount of training data. A standard theoretical approach is to prove a bound on generalization error as a function of the training error and the concept size (or VC dimension). One can then select a concept minimizing this bound, i.e., optimizing a certain tradeoff, as expressed in the bound, between traini ng error and concept size. Bounds on generalization error that express a tradeoff between the training error and the size of the concept are often called structural risk minimization (SRM) formulas. A variety of SRM bounds have been proved in the literature [Vap82]. The following SRM bound was proved in [McA98] and, for completeness, is proved again in Section 2. It state s that with probability1 over the sampleS we have the following.