Variable selection for high-dimensional generalized varying-coefficient models

In this paper, we consider the problem of variable selection for high- dimensional generalized varying-coefficient models and propose a polynomial-spline based procedure that simultaneously eliminates irrelevant predictors and estimates the nonzero coefficients. In a "large p, small n" setting, we demonstrate the conver- gence rates of the estimator under suitable regularity assumptions. In particular, we show the adaptive group lasso estimator can correctly select important vari- ables with probability approaching one and the convergence rates for the nonzero coefficients are the same as the oracle estimator (the estimator when the impor- tant variables are known before carrying out statistical analysis). To automatically choose the regularization parameters, we use the extended Bayesian information cri- terion (eBIC) that effectively controls the number of false positives. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed procedures.

[1]  Jianhua Z. Huang,et al.  Polynomial Spline Estimation and Inference for Varying Coefficient Models with Longitudinal Data , 2003 .

[2]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[3]  R. Tibshirani,et al.  Varying‐Coefficient Models , 1993 .

[4]  Jian Huang,et al.  SCAD-penalized regression in high-dimensional partially linear models , 2009, 0903.5474.

[5]  Jianhua Z. Huang,et al.  Variable Selection in Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements , 2008, Journal of the American Statistical Association.

[6]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[7]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[8]  Lixing Zhu,et al.  NONCONCAVE PENALIZED M-ESTIMATION WITH A DIVERGING NUMBER OF PARAMETERS , 2011 .

[9]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[10]  D. Billheimer Functional Data Analysis, 2nd edition edited by J. O. Ramsay and B. W. Silverman , 2007 .

[11]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[12]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[13]  M. Schemper,et al.  A solution to the problem of separation in logistic regression , 2002, Statistics in medicine.

[14]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[15]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[16]  Ji Zhu,et al.  Variable Selection With the Strong Heredity Constraint and Its Oracle Property , 2010 .

[17]  R. Gentleman,et al.  Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. , 2004, Blood.

[18]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[19]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[20]  Runze Li,et al.  Variable Selection in Semiparametric Regression Modeling. , 2008, Annals of statistics.

[21]  M. Yuan,et al.  On the non‐negative garrotte estimator , 2007 .

[22]  S. Geer Applications of empirical process theory , 2000 .

[23]  Hansheng Wang,et al.  Computational Statistics and Data Analysis a Note on Adaptive Group Lasso , 2022 .

[24]  Jian Huang,et al.  VARIABLE SELECTION AND ESTIMATION IN HIGH-DIMENSIONAL VARYING-COEFFICIENT MODELS. , 2011, Statistica Sinica.

[25]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[26]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[27]  J S Preisser,et al.  Robust Regression for Clustered Data with Application to Binary Responses , 1999, Biometrics.

[28]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[29]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[30]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[31]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[32]  Yingcun Xia,et al.  Shrinkage Estimation of the Varying Coefficient Model , 2008 .

[33]  Clifford Lam,et al.  PROFILE-KERNEL LIKELIHOOD INFERENCE WITH DIVERGING NUMBER OF PARAMETERS. , 2008, Annals of statistics.

[34]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[35]  John A. Nelder,et al.  Generalized linear models. 2nd ed. , 1993 .

[36]  Jianqing Fan,et al.  Efficient Estimation and Inferences for Varying-Coefficient Models , 2000 .