A comparison of genomic profiles of complex diseases under different models

BackgroundVarious approaches are being used to predict individual risk to polygenic diseases from data provided by genome-wide association studies. As there are substantial differences between the diseases investigated, the data sets used and the way they are tested, it is difficult to assess which models are more suitable for this task.ResultsWe compared different approaches for seven complex diseases provided by the Wellcome Trust Case Control Consortium (WTCCC) under a within-study validation approach. Risk models were inferred using a variety of learning machines and assumptions about the underlying genetic model, including a haplotype-based approach with different haplotype lengths and different thresholds in association levels to choose loci as part of the predictive model. In accordance with previous work, our results generally showed low accuracy considering disease heritability and population prevalence. However, the boosting algorithm returned a predictive area under the ROC curve (AUC) of 0.8805 for Type 1 diabetes (T1D) and 0.8087 for rheumatoid arthritis, both clearly over the AUC obtained by other approaches and over 0.75, which is the minimum required for a disease to be successfully tested on a sample at risk, which means that boosting is a promising approach. Its good performance seems to be related to its robustness to redundant data, as in the case of genome-wide data sets due to linkage disequilibrium.ConclusionsIn view of our results, the boosting approach may be suitable for modeling individual predisposition to Type 1 diabetes and rheumatoid arthritis based on genome-wide data and should be considered for more in-depth research.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  L. Wasserman,et al.  On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. , 2003, American journal of human genetics.

[7]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[8]  María M. Abad-Grau,et al.  Building chromosome-wide LD maps , 2006, Bioinform..

[9]  Ewout W Steyerberg,et al.  The impact of genotype frequencies on the clinical validity of genomic profiling for predicting common chronic diseases , 2007, Genetics in Medicine.

[10]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[11]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[12]  Cornelia M van Duijn,et al.  Genome-based prediction of common diseases: advances and prospects. , 2008, Human molecular genetics.

[13]  Daniel E. Weeks,et al.  Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers , 2009, PLoS genetics.

[14]  Joseph T. Glessner,et al.  From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes , 2009, PLoS genetics.

[15]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[16]  Jing Cui,et al.  Integration of genetic risk factors into a clinical algorithm for multiple sclerosis susceptibility: a weighted genetic risk score , 2009, The Lancet Neurology.

[17]  Peter M Visscher,et al.  Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. , 2009, Human molecular genetics.

[18]  Alberto Piazza,et al.  Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants , 2009, Nature Genetics.

[19]  P. Kraft,et al.  Cumulative Association of Twenty-Two Genetic Variants with Seropositive Rheumatoid Arthritis Risk , 2010 .

[20]  L. Peltonen,et al.  A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses , 2010, The Lancet.

[21]  Ludwig Kappos,et al.  Modeling the cumulative genetic risk for multiple sclerosis from genome-wide association data , 2011, Genome Medicine.

[22]  Hongyu Zhao,et al.  Practical Issues in Building Risk-Predicting Models for Complex Diseases , 2010, Journal of biopharmaceutical statistics.

[23]  G. Cooper,et al.  An efficient bayesian method for predicting clinical outcomes from genome-wide data. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[24]  P. Visscher,et al.  The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling , 2010, PLoS genetics.

[25]  Valerie Obenchain,et al.  Risk prediction using genome‐wide association studies , 2010, Genetic epidemiology.

[26]  William S Bush,et al.  Evidence for polygenic susceptibility to multiple sclerosis--the shape of things to come. , 2010, American journal of human genetics.

[27]  D. Clayton,et al.  Genome-wide association study and meta-analysis finds over 40 loci affect risk of type 1 diabetes , 2009, Nature Genetics.

[28]  P. Kraft,et al.  Cumulative association of 22 genetic variants with seropositive rheumatoid arthritis risk , 2010, Annals of the rheumatic diseases.

[29]  J. Barrett,et al.  Genetic risk prediction in complex disease , 2011, Human molecular genetics.

[30]  Pui-Yan Kwok,et al.  A Genetic Risk Score Combining Ten Psoriasis Risk Loci Improves Disease Prediction , 2011, PloS one.

[31]  K. Liao,et al.  Genetic Risk Score Predicting Risk of Rheumatoid Arthritis Phenotypes and Age of Symptom Onset , 2011, PloS one.

[32]  Judy H Cho,et al.  Improved risk prediction for Crohn's disease with a multi-locus approach. , 2011, Human molecular genetics.

[33]  N. Wray,et al.  Genetic risk profiles for depression and anxiety in adult and elderly cohorts , 2010, Molecular Psychiatry.

[34]  Andrés R. Masegosa,et al.  Riskoweb: Web-Based Genetic Profiling to Complex Disease Using Genome-Wide SNP Markers , 2011, PACBB.

[35]  Paola Sebastiani,et al.  Genome-Wide Association Studies (GWAS) , 2019, Definitions.

[36]  Vineet Bafna,et al.  Sample Reproducibility of Genetic Association Using Different Multimarker TDTs in Genome-Wide Association Studies: Characterization and a New Approach , 2012, PloS one.

[37]  O. Delaneau,et al.  Supplementary Information for ‘ Improved whole chromosome phasing for disease and population genetic studies ’ , 2012 .

[38]  Andrés R. Masegosa,et al.  Haplotype-based Classifiers to Predict Individual Susceptibility to Complex Diseases - An Example for Multiple Sclerosis , 2012, BIOINFORMATICS.

[39]  E. Franco,et al.  Performance of an Adipokine Pathway-Based Multilocus Genetic Risk Score for Prostate Cancer Risk Prediction , 2012, PloS one.

[40]  I. Heid,et al.  Modelling the Genetic Risk in Age-Related Macular Degeneration , 2012, PloS one.

[41]  S. Jee,et al.  Prediction of Colorectal Cancer Risk Using a Genetic Risk Score: The Korean Cancer Prevention Study-II (KCPS-II) , 2012, Genomics & informatics.

[42]  Li Jin,et al.  Prediction of lung cancer risk in a Chinese population using a multifactorial genetic model , 2012, BMC Medical Genetics.

[43]  A. Hofman,et al.  A genetic risk score based on direct associations with coronary heart disease improves coronary heart disease risk prediction in the Atherosclerosis Risk in Communities (ARIC), but not in the Rotterdam and Framingham Offspring, Studies. , 2012, Atherosclerosis.

[44]  Paola Sebastiani,et al.  Naïve Bayesian Classifier and Genetic Risk Score for Genetic Risk Prediction of a Categorical Trait: Not so Different after all! , 2012, Front. Gene..

[45]  T. Kupiec,et al.  Prediction of Eye Color from Genetic Data Using Bayesian Approach * , 2012, Journal of forensic sciences.

[46]  F. Agakov,et al.  Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models , 2015, Human molecular genetics.