Universal Algorithms for Learning Theory. Part II: Piecewise Polynomial Functions

This paper is concerned with estimating the regression function fρ in supervised learning by utilizing piecewise polynomial approximations on adaptively generated partitions. The main point of interest is algorithms that with high probability are optimal in terms of the least square error achieved for a given number m of observed data. In a previous paper [1], we have developed for each β > 0 an algorithm for piecewise constant approximation which is proven to provide such optimal order estimates with probability larger than 1- m-β. In this paper we consider the case of higher-degree polynomials. We show that for general probability measures ρ empirical least squares minimization will not provide optimal error estimates with high probability. We go further in identifying certain conditions on the probability measure ρ which will allow optimal estimates with high probability.