VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS.

We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is "small" relative to the sample size. The statistical problem is to determine which additive components are nonzero. The additive components are approximated by truncated series expansions with B-spline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model, and the adaptive group Lasso selects the nonzero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method.

[1]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[2]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[3]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[4]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[5]  C. J. Stone,et al.  The Dimensionality Reduction Principle for Generalized Additive Models , 1986 .

[6]  T. Hastie,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Discussion , 1993 .

[7]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[8]  J. Friedman,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Response , 1993 .

[9]  G. Perdew,et al.  Regulation of Gene Expression , 2008, Goodman's Medical Cell Biology.

[10]  W. Wong,et al.  Convergence Rate of Sieve Estimates , 1994 .

[11]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  X. Sala-i-Martin,et al.  I Just Ran Two Million Regressions , 1997 .

[14]  Xiaotong Shen,et al.  Local asymptotics for regression splines and confidence regions , 1998 .

[15]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[16]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[17]  Joel L. Horowitz,et al.  Nonparametric estimation of an additive model with a link function , 2002, math/0508595.

[18]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[19]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[20]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[21]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[22]  Meta M. Voelker,et al.  Variable Selection and Model Building via Likelihood Basis Pursuit , 2004 .

[23]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[24]  Joel L. Horowitz,et al.  Nonparametric Estimation of an Additive Quantile Regression Model , 2004 .

[25]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[26]  Joel L. Horowitz,et al.  Optimal estimation in additive regression models , 2006 .

[27]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[28]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[29]  V. Sheffield,et al.  Regulation of gene expression in the mammalian eye and its relevance to eye disease , 2006, Proceedings of the National Academy of Sciences.

[30]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[31]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[32]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[33]  Hao Helen Zhang,et al.  COMPONENT SELECTION AND SMOOTHING FOR NONPARAMETRIC REGRESSION IN EXPONENTIAL FAMILIES , 2006 .

[34]  Hongzhe Li,et al.  Group SCAD regression analysis for microarray time course gene expression data , 2007, Bioinform..

[35]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[36]  Cun-Hui Zhang PENALIZED LINEAR UNBIASED SELECTION , 2007 .

[37]  C. Tapiero,et al.  DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE , 2007 .

[38]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[39]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[40]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[41]  Cun-Hui Zhang Discussion: One-step sparse estimates in nonconcave penalized likelihood models , 2008, 0808.1025.

[42]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[43]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[44]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[45]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[46]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[47]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[48]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[49]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[50]  Yingcun Xia,et al.  Shrinkage Estimation of the Varying Coefficient Model , 2008 .

[51]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[52]  Jian Huang,et al.  Consistent group selection in high-dimensional linear regression. , 2010, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[53]  Zehua Chen,et al.  EXTENDED BIC FOR SMALL-n-LARGE-P SPARSE GLM , 2012 .