Simultaneous estimation of gene‐gene and gene‐environment interactions for numerous loci using double penalized log–likelihood

Many common human diseases are considered to be caused by complex multifactorial processes. For these diseases, it is expected that numerous genetic and environmental factors and, possibly, their interactions play a role. Therefore, simultaneously analyzing the effects of numerous genes and environmental factors is a more realistic approach compared to single gene analyses, but the large number of genes and environmental factors pose a challenge, not in the least due to the limitations created by the tools available for analyzing such high‐dimensional models. In the present manuscript we propose a method that is capable of identifying “true” interactions in a setting where the number of effects to be estimated is very large and can even surpass the number of observations. Basically, all possible (interaction) effects are entered in a double penalized model, where main effects are ridge penalized, whereas the interactions are subjected to a least absolute shrinkage and selection operator (lasso) penalty. Results from the simulations and real data show that the proposed method is capable of detecting interactions even with relative small effect sizes. Genet. Epidemiol. 2006. © 2006 Wiley‐Liss, Inc.

[1]  Jurg Ott,et al.  Sum statistics for the joint detection of multiple disease loci in case‐control association studies with SNP markers , 2003, Genetic epidemiology.

[2]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[3]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[4]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[5]  E. Lander,et al.  Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results , 1995, Nature Genetics.

[6]  Bill C White,et al.  Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases , 2003, BMC Bioinformatics.

[7]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[8]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[9]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[10]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[11]  D. Tregouet,et al.  Automated detection of informative combined effects in genetic association studies of complex traits. , 2003, Genome research.

[12]  M J Malloy,et al.  A multilocus genotyping assay for candidate markers of cardiovascular disease risk. , 1999, Genome research.

[13]  Junghan Song,et al.  Analysis of Multiple Single Nucleotide Polymorphisms of Candidate Genes Related to Coronary Heart Disease Susceptibility by Using Support Vector Machines , 2003, Clinical chemistry and laboratory medicine.

[14]  G J Boerma,et al.  Effects of lipid lowering by pravastatin on progression and regression of coronary artery disease in symptomatic men with normal to moderately elevated serum cholesterol levels. The Regression Growth Evaluation Statin Study (REGRESS). , 1995, Circulation.

[15]  R. Jansen,et al.  A penalized likelihood method for mapping epistatic quantitative trait Loci with one-dimensional genome searches. , 2002, Genetics.

[16]  H. Izawa,et al.  Prediction of the risk of myocardial infarction from polymorphisms in candidate genes. , 2002 .

[17]  J. Ott,et al.  Selecting SNPs in two‐stage analysis of disease association data: a model‐free approach , 2000, Annals of human genetics.