Evaluation of genetic risk score models in the presence of interaction and linkage disequilibrium

In the area of genetic epidemiology, genetic risk predictive modeling is becoming an important area of translational success. As an increasing number of genetic variants are successfully discovered, the use of multiple genetic variants in constructing a genetic risk score (GRS) for modeling has been widely applied using a variety of approaches. Previously, we compared the performance of a simple, additive GRS with weighted GRS approaches, but our initial simulation experiment assumed very simple models without many of the complications found in real genetic studies. In particular, interactions between variants and linkage disequilibrium (LD) (indirect mapping) remain important and challenging problems for GRS modeling. In the present study, we applied two simulation strategies to mimic various types of epistasis to evaluate their impact on the performance of the GRS models. We simulated a range of models demonstrating statistical interaction and linkage disequilibrium. Three genetic risk models were compared in terms of power, type I error, C-statistic and AIC, including a simple count GRS (SC-GRS), an odds ratio weighted GRS (OR-GRS) and an explained variance weighted GRS (EV-GRS). Simulation factors of interest included allele frequencies, effect sizes, strengths of interaction, degrees of LD and heritability. We extensively examined the extent to how these interactions could influence the performance of genetic risk models. Our results show that the weighted methods outperform simple count method in general even if interaction or LD is present, with well controlled type I error.

[1]  Benjamin Yakir,et al.  Linkage disequilibrium patterns of the human genome across populations. , 2003, Human molecular genetics.

[2]  Andreas Ziegler,et al.  Evaluating diagnostic accuracy of genetic profiles in affected offspring families , 2010, Statistics in medicine.

[3]  R. D'Agostino,et al.  Genotype score in addition to common risk factors for prediction of type 2 diabetes. , 2008, The New England journal of medicine.

[4]  Paul M. Ridker,et al.  Association Between a Literature-Based Genetic Risk Score and Cardiovascular Events in Women , 2010 .

[5]  Marylyn D. Ritchie,et al.  The success of pharmacogenomics in moving genetic association studies from bench to bedside: study design and implementation of precision medicine in the post-GWAS era , 2012, Human Genetics.

[6]  Ting Hu,et al.  Characterizing genetic interactions in human disease association studies using statistical epistasis networks , 2011, BMC Bioinformatics.

[7]  Xin Wang,et al.  SNP interaction detection with Random Forests in high-dimensional genetic data , 2012, BMC Bioinformatics.

[8]  Jiang Gui,et al.  A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction , 2009, Genetic epidemiology.

[9]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[10]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[11]  W. G. Hill,et al.  Heritability in the genomics era — concepts and misconceptions , 2008, Nature Reviews Genetics.

[12]  Nancy R Cook,et al.  Association between a literature-based genetic risk score and cardiovascular events in women. , 2010, JAMA.

[13]  S. Humphries,et al.  Utility of genetic and non-genetic risk factors in prediction of type 2 diabetes: Whitehall II prospective cohort study , 2010, BMJ : British Medical Journal.

[14]  R. Altman,et al.  Estimation of the warfarin dose with clinical and pharmacogenetic data. , 2009, The New England journal of medicine.

[15]  K. Lunetta Genetic Association Studies , 2008, Circulation.

[16]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[17]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[18]  Nilanjan Chatterjee,et al.  Estimation of effect size distribution from genome-wide association studies and implications for future discoveries , 2010, Nature Genetics.

[19]  Alison A. Motsinger-Reif,et al.  A comparison of internal validation techniques for multifactor dimensionality reduction , 2010, BMC Bioinformatics.

[20]  A. Motsinger-Reif,et al.  A New Explained-Variance Based Genetic Risk Score for Predictive Modeling of Disease Risk , 2012, Statistical applications in genetics and molecular biology.

[21]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[22]  J. Ross,et al.  Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[23]  Chad C. Brown,et al.  Loss of Power in Two‐Stage Residual‐Outcome Regression Analysis in Genetic Association Studies , 2012, Genetic epidemiology.