Efficient Sampling for Learning Sparse Additive Models in High Dimensions

We consider the problem of learning sparse additive models, i.e., functions of the form: f(x) = Σl∈S ϕl(xl), x ∈ ℝd from point queries of f. Here S is an unknown subset of coordinate variables with |S| = k ≪ d. Assuming ϕl's to be smooth, we propose a set of points at which to sample f and an efficient randomized algorithm that recovers a uniform approximation to each unknown ϕl. We provide a rigorous theoretical analysis of our scheme along with sample complexity bounds. Our algorithm utilizes recent results from compressive sensing theory along with a novel convex quadratic program for recovering robust uniform approximations to univariate functions, from point queries corrupted with arbitrary bounded noise. Lastly we theoretically analyze the impact of noise - either arbitrary but bounded, or stochastic - on the performance of our algorithm.

[1]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[2]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[3]  I. J. Schoenberg,et al.  SPLINE FUNCTIONS AND THE PROBLEM OF GRADUATION , 1964 .

[4]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[5]  Volkan Cevher,et al.  Active Learning of Multi-Index Function Models , 2012, NIPS.

[6]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[7]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[8]  W. Weston Meyer,et al.  Optimal error bounds for cubic spline interpolation , 1976 .

[9]  K. Ritter,et al.  Minimal Errors for Strong and Weak Approximation of Stochastic Differential Equations , 2008 .

[10]  P. Wojtaszczyk 1 Minimization with Noisy Data , 2012, SIAM J. Numer. Anal..

[11]  M. Maathuis,et al.  Estimating high-dimensional intervention effects from observational data , 2008, 0810.4214.

[12]  Ming Yuan,et al.  Nonnegative Garrote Component Selection in Functional ANOVA models , 2007, AISTATS.

[13]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[14]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[15]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.

[16]  C. Reinsch Smoothing by spline functions , 1967 .

[17]  J. L. Walsh,et al.  The theory of splines and their applications , 1969 .

[18]  Jan Vybíral,et al.  Learning Functions of Few Arbitrary Linear Parameters in High Dimensions , 2010, Found. Comput. Math..

[19]  R. DeVore,et al.  Approximation of Functions of Few Variables in High Dimensions , 2011 .

[20]  H. Woxniakowski Information-Based Complexity , 1988 .

[21]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[22]  I. Daubechies,et al.  Capturing Ridge Functions in High Dimensions from Point Queries , 2012 .

[23]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[24]  V. Koltchinskii,et al.  SPARSITY IN MULTIPLE KERNEL LEARNING , 2010, 1211.2998.

[25]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[26]  Ming Yuan,et al.  Sparse Recovery in Large Ensembles of Kernel Machines On-Line Learning and Bandits , 2008, COLT.