Variable Selection in Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements

Nonparametric varying-coefficient models are commonly used for analyzing data measured repeatedly over time, including longitudinal and functional response data. Although many procedures have been developed for estimating varying coefficients, the problem of variable selection for such models has not been addressed to date. In this article we present a regularized estimation procedure for variable selection that combines basis function approximations and the smoothly clipped absolute deviation penalty. The proposed procedure simultaneously selects significant variables with time-varying effects and estimates the nonzero smooth coefficient functions. Under suitable conditions, we establish the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation. Here the oracle property means that the asymptotic distribution of an estimated coefficient function is the same as that when it is known a priori which variables are in the model. The method is illustrated with simulations and two real data examples, one for identifying risk factors in the study of AIDS and one using microarray time-course gene expression data to identify the transcription factors related to the yeast cell-cycle process.

[1]  Henry Horng-Shing Lu,et al.  Statistical methods for identifying yeast cell cycle transcription factors. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Li Ping Yang,et al.  Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data , 1998 .

[3]  Stanley R. Johnson,et al.  Varying Coefficient Models , 1984 .

[4]  P. Diggle,et al.  Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. , 1994, Biometrics.

[5]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[6]  Jianqing Fan,et al.  New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis , 2004 .

[7]  Liugen Xue,et al.  Variable selection for semiparametric varying coefficient partially linear models , 2009 .

[8]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[9]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[10]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Hao Helen Zhang,et al.  Component selection and smoothing in smoothing spline analysis of variance models -- COSSO , 2003 .

[12]  Jianhua Z. Huang,et al.  Varying‐coefficient models and basis function approximations for the analysis of repeated measurements , 2002 .

[13]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[14]  Jianhua Z. Huang Covariance selection and estimation via penalised normal likelihood , 2005 .

[15]  Donald Hedeker,et al.  Longitudinal Data Analysis , 2006 .

[16]  Hao Helen Zhang,et al.  Nonparametric model selection in hazard regression , 2005 .

[17]  Michael Q. Zhang,et al.  Identifying cooperativity among transcription factors controlling the cell cycle in yeast. , 2003, Nucleic acids research.

[18]  S. Geer,et al.  Regularization in statistics , 2006 .

[19]  Hao Helen Zhang,et al.  COMPONENT SELECTION AND SMOOTHING FOR NONPARAMETRIC REGRESSION IN EXPONENTIAL FAMILIES , 2006 .

[20]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[21]  Chin-Tsang Chiang,et al.  KERNEL SMOOTHING ON VARYING COEFFICIENT MODELS WITH LONGITUDINAL DEPENDENT VARIABLE , 2000 .

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  B. Silverman,et al.  Estimating the mean and covariance structure nonparametrically when the data are curves , 1991 .

[24]  Christian J Stoeckert,et al.  Clustering of genes into regulons using integrated modeling-COGRIM , 2007, Genome Biology.

[25]  Jianhua Z. Huang,et al.  Polynomial Spline Estimation and Inference for Varying Coefficient Models with Longitudinal Data , 2003 .

[26]  Colin O. Wu,et al.  Nonparametric Mixed Effects Models for Unequally Sampled Noisy Curves , 2001, Biometrics.

[27]  Hongzhe Li,et al.  Group SCAD regression analysis for microarray time course gene expression data , 2007, Bioinform..

[28]  R. Carroll,et al.  Nonparametric Function Estimation for Clustered Data When the Predictor is Measured without/with Error , 2000 .

[29]  M. Pourahmadi,et al.  Nonparametric estimation of large covariance matrices of longitudinal data , 2003 .

[30]  Jianhua Z. Huang,et al.  Efficient estimation in marginal partially linear models for longitudinal/clustered data using splines , 2007 .

[31]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[32]  Chin-Tsang Chiang,et al.  Smoothing Spline Estimation for Varying Coefficient Models With Repeatedly Measured Dependent Variables , 2001 .

[33]  John A. Rice,et al.  FUNCTIONAL AND LONGITUDINAL DATA ANALYSIS: PERSPECTIVES ON SMOOTHING , 2004 .

[34]  Jianhua Z. Huang,et al.  Estimation of Large Covariance Matrices of Longitudinal Data With Basis Function Approximations , 2007 .

[35]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[36]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[37]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[38]  Jianhua Z. Huang Local asymptotics for polynomial spline regression , 2003 .

[39]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[40]  P. Diggle Analysis of Longitudinal Data , 1995 .

[41]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[42]  Jianhua Z. Huang,et al.  Covariance matrix selection and estimation via penalised normal likelihood , 2006 .

[43]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[44]  Nicola J. Rinaldi,et al.  Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle , 2001, Cell.

[45]  Hao Helen Zhang Variable selection for support vector machines via smoothing spline anova , 2006 .

[46]  Ya Zhang,et al.  Clustering of Time-Course Gene Expression Data , 2004 .

[47]  J. Phair,et al.  The Multicenter AIDS Cohort Study: rationale, organization, and selected characteristics of the participants. , 1987, American journal of epidemiology.

[48]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.