High dimensional variable selection for gene-environment interactions

Abstract Gene-environment (G×E) interaction plays a pivotal role in understanding the genetic basis of complex disease. When environment factors are measured in a continuous scale, one can assess the genetic sensitivity over different environmental conditions on a disease phenotype. Motivated by the increasing awareness of the power of gene set based association analysis over single variant based approach, we proposed an additive varying-coefficient model to jointly model variants in a genetic system. The model allows us to examine how variants in a set are mediated by one or multiple environment factors to affect a disease phenotype. We approached the problem from a high dimensional variable selection perspective. In particular, we can select variants with varying, constant and zero coefficients, which correspond to cases of G×E interaction, no G×E interaction and no genetic effect, respectively. The procedure was implemented through a two stage iterative estimation algorithm via the Smoothly Clipped Absolute Deviation (SCAD) penalty function. Under certain regularity conditions, we established the consistency property in variable selection as well as effect separation of the two stage iterative estimators, and showed the optimal convergence rates of the estimates for varying effects. In addition, we showed that the estimate of non-zero constant coefficients enjoy the oracle property. The utility of our procedure was demonstrated through simulation studies and real data analysis.

[1]  Jianhua Z. Huang,et al.  Variable Selection in Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements , 2008, Journal of the American Statistical Association.

[2]  Nilanjan Chatterjee,et al.  Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies , 2005 .

[3]  Peter Kraft,et al.  Gene-environment interactions in genome-wide association studies: a comparative study of tests applied to empirical studies of type 2 diabetes. , 2012, American journal of epidemiology.

[4]  Arnab Maity,et al.  Testing in semiparametric models with interaction, with applications to gene-environment interactions. , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[5]  D. Harrison,et al.  The JAK/STAT signaling pathway , 2004, Journal of Cell Science.

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  Sun-Wei Guo,et al.  Gene-Environment Interaction and the Mapping of Complex Traits: Some Statistical Models and Their Implications , 2000, Human Heredity.

[8]  Yuehua Cui,et al.  Varying coefficient model for gene-environment interaction: a non-linear look , 2011, Bioinform..

[9]  Minping Qian,et al.  Gene-Centric Genomewide Association Study via Entropy , 2008, Genetics.

[10]  Cen Wu,et al.  Boosting signals in gene-based association studies via efficient SNP selection , 2014, Briefings Bioinform..

[11]  R. Tibshirani,et al.  Varying‐Coefficient Models , 1993 .

[12]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[13]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[14]  H. Hakonarson,et al.  Analysing biological pathways in genome-wide association studies , 2010, Nature Reviews Genetics.

[15]  Jianhua Z. Huang,et al.  Varying‐coefficient models and basis function approximations for the analysis of repeated measurements , 2002 .

[16]  Mi-Ok Kim,et al.  Quantile regression with varying coefficients , 2007, 0708.0471.

[17]  Jianhua Z. Huang,et al.  Polynomial Spline Estimation and Inference for Varying Coefficient Models with Longitudinal Data , 2003 .

[18]  Zhongyi Zhu,et al.  A UNIFIED VARIABLE SELECTION APPROACH FOR VARYING COEFFICIENT MODELS , 2012 .

[19]  Cen Wu,et al.  A novel method for identifying nonlinear gene–environment interactions in case–control association studies , 2013, Human Genetics.

[20]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[21]  D. Schaid,et al.  Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies , 2012, Genetic epidemiology.

[22]  H. Stigum,et al.  Maternal pre‐pregnant body mass index, maternal weight change and offspring birthweight , 2012, Acta obstetricia et gynecologica Scandinavica.