Universal Algorithms for Learning Theory Part I : Piecewise Constant Functions

This paper is concerned with the construction and analysis of a universal estimator for the regression problem in supervised learning. Universal means that the estimator does not depend on any a priori assumptions about the regression function to be estimated. The universal estimator studied in this paper consists of a least-square fitting procedure using piecewise constant functions on a partition which depends adaptively on the data. The partition is generated by a splitting procedure which differs from those used in CART algorithms. It is proven that this estimator performs at the optimal convergence rate for a wide class of priors on the regression function. Namely, as will be made precise in the text, if the regression function is in any one of a certain class of approximation spaces (or smoothness spaces of order not exceeding one -- a limitation resulting because the estimator uses piecewise constants) measured relative to the marginal measure, then the estimator converges to the regression function (in the least squares sense) with an optimal rate of convergence in terms of the number of samples. The estimator is also numerically feasible and can be implemented on-line.

[1]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[2]  G. Kerkyacharian,et al.  Minimax or maxisets , 2002 .

[3]  P. Massart,et al.  Gaussian model selection , 2001 .

[4]  Servane Gey,et al.  Model selection for CART regression trees , 2005, IEEE Transactions on Information Theory.

[5]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[6]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[7]  Vladimir Temlyakov,et al.  The Entropy in Learning Theory. Error Estimates , 2007 .

[8]  Ronald A. DeVore,et al.  Fast computation in adaptive tree approximation , 2004, Numerische Mathematik.

[9]  D. Donoho CART AND BEST-ORTHO-BASIS: A CONNECTION' , 1997 .

[10]  Albert Cohen,et al.  Maximal Spaces with Given Rate of Convergence for Thresholding Algorithms , 2001 .

[11]  I. Johnstone,et al.  Wavelet Shrinkage: Asymptopia? , 1995 .

[12]  I. Johnstone,et al.  Density estimation by wavelet thresholding , 1996 .

[13]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[14]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[15]  I. Daubechies,et al.  Tree Approximation and Optimal Encoding , 2001 .

[16]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[17]  Y. Baraud Model selection for regression on a random design , 2002 .

[18]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .

[19]  L. Birge,et al.  Model selection via testing: an alternative to (penalized) maximum likelihood estimators , 2006 .