Estimation and Inference in Generalized Additive Coefficient Models for Nonlinear Interactions with High-Dimensional Covariates.

In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica16 (2006) 1423-1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we propose a groupwise penalization based procedure to distinguish significant covariates for the "large p small n" setting. The procedure is shown to be consistent for model structure identification. Further, we construct simultaneous confidence bands for the coefficient functions in the selected model based on a refined two-step spline estimator. We also discuss how to choose the tuning parameters. To estimate the standard deviation of the functional estimator, we adopt the smoothed bootstrap method. We conduct simulation experiments to evaluate the numerical performance of the proposed methods and analyze an obesity data set from a genome-wide association study as an illustration.

[1]  Jianhua Z. Huang Local asymptotics for polynomial spline regression , 2003 .

[2]  W. Gauderman,et al.  Gene-environment interaction in genome-wide association studies. , 2008, American journal of epidemiology.

[3]  Lijian Yang,et al.  Spline-backfitted kernel smoothing of partially linear additive model , 2011 .

[4]  B. Efron Estimation and Accuracy After Model Selection , 2014, Journal of the American Statistical Association.

[5]  George G. Lorentz,et al.  Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[6]  James Stephen Marron,et al.  BOOTSTRAP SIMULTANEOUS ERROR BARS FOR NONPARAMETRIC REGRESSION , 1991 .

[7]  Yingying Fan,et al.  Tuning parameter selection in high dimensional penalized likelihood , 2013, 1605.03321.

[8]  Enno Mammen,et al.  Flexible generalized varying coefficient regression models , 2012, 1210.4711.

[9]  G. Stoyan de Boor, C., A Practical Guide to Splines. Applied Mathematical Sciences 27. Berlin‐Heidelberg‐New York, Springer‐Verlag 1978. XXIV, 392 S., DM 32,50. US $ 17.90 , 1980 .

[10]  Lan Xue,et al.  ADDITIVE COEFFICIENT MODELING VIA POLYNOMIAL SPLINE , 2005 .

[11]  Wolfgang Härdle,et al.  Oracally Efficient Two-Step Estimation of Generalized Additive Model , 2011 .

[12]  Peter Buhlmann,et al.  Smoothing ℓ1-penalized estimators for high-dimensional time-course data , 2007, 0712.1654.

[13]  Lijian Yang,et al.  A jump-detecting procedure based on spline estimation , 2011 .

[14]  Raymond J. Carroll,et al.  A SIMULTANEOUS CONFIDENCE BAND FOR SPARSE LONGITUDINAL REGRESSION , 2012 .

[15]  Jianqing Fan,et al.  Generalized Partially Linear Single-Index Models , 1997 .

[16]  Hua Liang,et al.  Polynomial Spline Estimation for a Generalized Additive Coefficient Model , 2010, Scandinavian journal of statistics, theory and applications.

[17]  K. Fox,et al.  Physical activity and obesity , 2007, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[18]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[19]  Gerda Claeskens,et al.  Bootstrap confidence bands for regression curves and their derivatives , 2003 .

[20]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[21]  D. M. Titterington,et al.  On confidence bands in nonparametric density estimation and regression , 1988 .

[22]  K. Knutson Does inadequate sleep play a role in vulnerability to obesity? , 2012, American journal of human biology : the official journal of the Human Biology Council.

[23]  J. Hill Physical activity and obesity , 2004, The Lancet.

[24]  Murray D. Burke,et al.  Strong Approximations in Probability and Statistics , 2011, International Encyclopedia of Statistical Science.

[25]  Ross M. Fraser,et al.  Sex-stratified Genome-wide Association Studies Including 270,000 Individuals Show Sexual Dimorphism in Genetic Loci for Anthropometric Traits , 2013, PLoS genetics.

[26]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[27]  D. Nyholt A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. , 2004, American journal of human genetics.

[28]  Bo Jiang,et al.  Variable selection for general index models via sliced inverse regression , 2013, 1304.4056.

[29]  Clifford Lam,et al.  PROFILE-KERNEL LIKELIHOOD INFERENCE WITH DIVERGING NUMBER OF PARAMETERS. , 2008, Annals of statistics.

[30]  A. Qu,et al.  Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates , 2014, 1405.6030.

[31]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[32]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[33]  Lijian Yang,et al.  SPLINE-BACKFITTED KERNEL SMOOTHING OF ADDITIVE COEFFICIENT MODEL , 2010, Econometric Theory.

[34]  T. Dawber,et al.  Epidemiological approaches to heart disease: the Framingham Study. , 1951, American journal of public health and the nation's health.

[35]  Joel L. Horowitz,et al.  Nonparametric estimation of an additive model with a link function , 2002, math/0508595.

[36]  Heng Lian Variable selection for high-dimensional generalized varying-coefficient models , 2012 .

[37]  Ulf Ekelund,et al.  Physical activity and obesity prevention: a review of the current evidence , 2005, Proceedings of the Nutrition Society.

[38]  Xiaotong Shen,et al.  Local asymptotics for regression splines and confidence regions , 1998 .

[39]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[40]  J. Cheverud,et al.  A simple correction for multiple comparisons in interval mapping genome scans , 2001, Heredity.

[41]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[42]  Joel L. Horowitz,et al.  Optimal estimation in additive regression models , 2006 .