Rodeo: Sparse Nonparametric Regression in High Dimensions

We present a method for nonparametric regression that performs bandwidth selection and variable selection simultaneously. The approach is based on the technique of incrementally decreasing the bandwidth in directions where the gradient of the estimator with respect to bandwidth is large. When the unknown function satisfies a sparsity condition, our approach avoids the curse of dimensionality, achieving the optimal mini-max rate of convergence, up to logarithmic factors, as if the relevant variables were known in advance. The method—called rodeo (regularization of derivative expectation operator)—conducts a sequence of hypothesis tests, and is easy to implement. A modified version that replaces hard with soft thresholding effectively solves a sequence of lasso problems.

[1]  J. Rice Bandwidth Choice for Nonparametric Regression , 1984 .

[2]  G. Wahba Spline models for observational data , 1990 .

[3]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[4]  Jianqing Fan Design-adaptive Nonparametric Regression , 1992 .

[5]  T. Hastie,et al.  Local Regression: Automatic Kernel Carpentry , 1993 .

[6]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[7]  M. Wand,et al.  Multivariate Locally Weighted Least Squares Regression , 1994 .

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  D. Ruppert Empirical-Bias Bandwidths for Local Polynomial Nonparametric Regression and Density Estimation , 1997 .

[10]  E. Mammen,et al.  Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors , 1997 .

[11]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[12]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[13]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[14]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[15]  J. Polzehl,et al.  Structure adaptive approach for dimension reduction , 2001 .

[16]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[17]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2001, Springer Series in Statistics.

[18]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[19]  A. Juditsky,et al.  Structure Adaptive Approach for Dimension Reduction 1 , 2001 .

[20]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[21]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[22]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[23]  D. Madigan Discussion of Least Angle Regression , 2003 .

[24]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[25]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[26]  Meta M. Voelker,et al.  Variable Selection and Model Building via Likelihood Basis Pursuit , 2004 .

[27]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[28]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[29]  L. Wasserman,et al.  A stochastic process approach to false discovery control , 2004, math/0406519.

[30]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[31]  V. Spokoiny,et al.  Component Identification and Estimation in Nonlinear High-Dimensional Regression Models by Structural Adaptation , 2005 .

[32]  Peter Bühlmann,et al.  Boosting, model selection, lasso and nonnegative garrote , 2005 .

[33]  C. Nachtsheim,et al.  Model‐free variable selection , 2005 .

[34]  D. Hinkley Annals of Statistics , 2006 .

[35]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[36]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.