Model selection for CART regression trees

The performance of the classification and regression trees (CART) pruning algorithm and the final discrete selection by test sample as a functional estimation procedure are considered. The validation of the pruning procedure applied to Gaussian and bounded regression is of primary interest. On the one hand, the paper shows that the complexity penalty used in the pruning algorithm is valid in both cases and, on the other hand, that, conditionally to the construction of the maximal tree, the final selection does not alter dramatically the estimation accuracy of the regression function. In both cases, the risk bounds that are proved, obtained by using the penalized model selection, validate the CART algorithm which is used in many applications such as meteorology, biology, medicine, pollution monitoring, or image coding.