Sparse estimation of gene–gene interactions in prediction models

Current assessment of gene–gene interactions is typically based on separate parallel analysis, where each interaction term is tested separately, while less attention has been paid on simultaneous estimation of interaction terms in a prediction model. As the number of interaction terms grows fast, sparse estimation is desirable from statistical and interpretability reasons. There is a large literature on sparse estimation, but there is a natural hierarchy between the interaction and its corresponding main effects that requires special considerations. We describe random-effect models that impose sparse estimation of interactions under both strong and weak-hierarchy constraints. We develop an estimation procedure based on the hierarchical-likelihood argument and show that the modelling approach is equivalent to a penalty-based method, with the advantage of the models being more transparent and flexible. We compare the procedure with some standard methods in a simulation study and illustrate its application in an analysis of gene–gene interaction model to predict body-mass index.

[1]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[2]  Hee-Seok Oh,et al.  A new sparse variable selection via random-effect model , 2014, J. Multivar. Anal..

[3]  J. Nelder,et al.  Double hierarchical generalized linear models (with discussion) , 2006 .

[4]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[5]  Mats E. Pettersson,et al.  Inheritance Beyond Plain Heritability: Variance-Controlling Genes in Arabidopsis thaliana , 2012, PLoS genetics.

[6]  Tanya M. Teslovich,et al.  Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index , 2010 .

[7]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[8]  Hugh Chipman,et al.  Bayesian variable selection with related predictors , 1995, bayes-an/9510001.

[9]  J. Nelder,et al.  Hierarchical Generalized Linear Models , 1996 .

[10]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[11]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[12]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[13]  Changbao Wu,et al.  Analysis of Designed Experiments with Complex Aliasing , 1992 .

[14]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[15]  J. Nelder A Reformulation of Linear Models , 1977 .

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.