Consistent Model Selection for Marginal Generalized Additive Model for Correlated Data

We consider the generalized additive model when responses from the same cluster are correlated. Incorporating correlation in the estimation of nonparametric components for the generalized additive model is important because it improves estimation efficiency and increases statistical power for model selection. In our setting, there is no specified likelihood function for the generalized additive model, because the outcomes could be nonnormal and discrete, which makes estimation and model selection very challenging problems. We propose consistent estimation and model selection that incorporate the correlation structure. We establish an asymptotic property with L2-norm consistency for the nonparametric components, which achieves the optimal rate of convergence. In addition, the proposed model selection strategy is able to select the correct generalized additive model consistently. That is, with probability approaching to 1, the estimators for the zero function components converge to 0 almost surely. We illustrate our method using numerical studies with both continuous and binary responses, along with a real data application of binary periodontal data. Supplemental materials including technical details are available online.

[1]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[2]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[3]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[4]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[5]  Xuming He,et al.  Bivariate Tensor-Product B-Splines in a Partly Linear Model , 1996 .

[6]  W. Härdle,et al.  Estimation of additive regression models with known links , 1996 .

[7]  Robert Tibshirani,et al.  Generalized additive models for longitudinal data , 1998 .

[8]  Jianhua Z. Huang Projection estimation in multiple regression with application to functional ANOVA models , 1998 .

[9]  B. Lindsay,et al.  Improving generalised estimating equations using quadratic inference functions , 2000 .

[10]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[11]  Joel L. Horowitz,et al.  NONPARAMETRIC ESTIMATION OF A GENERALIZED ADDITIVE MODEL WITH AN UNKNOWN LINK FUNCTION , 2001 .

[12]  B. Leroux,et al.  Analysis of clustered data: A combined estimating equations approach , 2002 .

[13]  Jianhua Z. Huang Local asymptotics for polynomial spline regression , 2003 .

[14]  Naisyin Wang Marginal nonparametric kernel regression accounting for within‐subject correlation , 2003 .

[15]  本田 純久 Longitudinal Data , 2003, Encyclopedia of Wireless Networks.

[16]  Jianhua Z. Huang,et al.  Polynomial Spline Estimation and Inference for Varying Coefficient Models with Longitudinal Data , 2003 .

[17]  Jianhua Z. Huang,et al.  Identification of non‐linear additive autoregressive models , 2004 .

[18]  Lan Xue,et al.  ADDITIVE COEFFICIENT MODELING VIA POLYNOMIAL SPLINE , 2005 .

[19]  Zhongyi Zhu,et al.  Robust Estimation in Generalized Partial Linear Models for Clustered Data , 2005 .

[20]  Runze Li,et al.  Quadratic Inference Functions for Varying‐Coefficient Models with Longitudinal Data , 2006, Biometrics.

[21]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[22]  Joel L. Horowitz,et al.  Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions , 2007, 0803.2999.

[23]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[24]  Jianhua Z. Huang,et al.  Variable Selection in Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements , 2008, Journal of the American Statistical Association.

[25]  Zhongyi Zhu,et al.  On the asymptotics of marginal regression splines with longitudinal data , 2008 .

[26]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[27]  Alexander S. Ecker,et al.  Generating Spike Trains with Specified Correlation Coefficients , 2009, Neural Computation.

[28]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[29]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.