Brownian Motions and Scrambled Wavelets for Least-Squares Regression

We consider ordinary (non penalized) least-squares regression where the regression function is chosen in a randomly generated sub-space GP \subset S of finite dimension P, where S is a function space of infinite dimension, e.g. L2([0, 1]^d). GP is defined as the span of P random features that are linear combinations of the basis functions of S weighted by random Gaussian i.i.d. coefficients. We characterize the so-called kernel space K \subset S of the resulting Gaussian process and derive approximation error bounds of order O(||f||^2_K log(P)/P) for functions f \in K approximated in GP . We apply this result to derive excess risk bounds for the least-squares estimate in various spaces. For illustration, we consider regression using the so-called scrambled wavelets (i.e. random linear combinations of wavelets of L2([0, 1]^d)) and derive an excess risk rate O(||f*||_K(logN)/sqrt(N)) which is arbitrarily close to the minimax optimal rate (up to a logarithmic factor) for target functions f* in K = H^s([0, 1]^d), a Sobolev space of smoothness order s > d/2. We describe an efficient implementation using lazy expansions with numerical complexity ˜O(2dN^3/2 logN+N^5/2), where d is the dimension of the input data and N is the number of data.

[1]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[2]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[3]  G. Bourdaud Ondelettes et espaces de Besov , 1995 .

[4]  M. Lifshits Gaussian Random Functions , 1995 .

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  S. Mallat A wavelet tour of signal processing , 1998 .

[7]  Stéphane Jaffard,et al.  Décompositions en Ondelettes , 2000 .

[8]  S. Canu,et al.  Functional learning through kernel , 2002 .

[9]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[10]  S. Canu,et al.  M L ] 6 O ct 2 00 9 Functional learning through kernel , 2009 .

[11]  H. Bungartz,et al.  Sparse grids , 2004, Acta Numerica.

[12]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[13]  A. Rahimi,et al.  Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[14]  A. Barron,et al.  Approximation and learning by greedy algorithms , 2008, 0803.1718.

[15]  Michael Frazier,et al.  Decomposition of Besov Spaces , 2009 .

[16]  Rémi Munos,et al.  Compressed Least-Squares Regression , 2009, NIPS.

[17]  Winfried Sickel,et al.  Tensor products of Sobolev-Besov spaces and applications to approximation from the hyperbolic cross , 2009, J. Approx. Theory.