Is there sparsity beyond additive models

Abstract In this work we are interested in the problems of supervised learning and variable selection when the input-output dependence is described by a nonlinear function depending on a few variables. Our goal is to devise a sparse nonparametric model, avoiding linear or additive models. The key intuition is to measure the importance of each variable in the model by making use of partial derivatives. Based on this idea we propose and study a new regularizer and a corresponding least squares regularization scheme. Using concepts and results from the theory of reproducing kernel Hilbert spaces and proximal methods, we show that the proposed learning algorithm induces a minimization problem which can be provably solved by an iterative procedure. The consistency properties of the obtained estimator are studied both in terms of prediction and selection performance.

[1]  G. Lecu'e,et al.  Selection of variables and dimension reduction in high-dimensional non-parametric regression , 2008, 0811.1115.

[2]  Mikhail Belkin,et al.  Towards a theoretical foundation for Laplacian-based manifold methods , 2005, J. Comput. Syst. Sci..

[3]  V. Koltchinskii,et al.  SPARSITY IN MULTIPLE KERNEL LEARNING , 2010, 1211.2998.

[4]  Lorenzo Rosasco,et al.  A Regularization Approach to Nonlinear Variable Selection , 2010, AISTATS.

[5]  R. DeVore,et al.  Approximation of Functions of Few Variables in High Dimensions , 2011 .

[6]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[7]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[8]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[9]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  G. Wahba,et al.  Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture , 1995 .

[12]  A. Dalalyan,et al.  Tight conditions for consistency of variable selection in the context of high dimensionality , 2011, 1106.4293.

[13]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[14]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[15]  I. Loris On the performance of algorithms for the minimization of ℓ1-penalized functionals , 2007, 0710.4082.

[16]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[17]  J. Lafferty,et al.  Rodeo: Sparse, greedy nonparametric regression , 2008, 0803.1709.

[18]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[19]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[20]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[21]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.