Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction

Accurate prediction of disease risk based on genetic factors is an important goal in human genetics research and precision medicine. Advanced prediction models will lead to more effective disease prevention and treatment strategies. Despite the identification of thousands of disease-associated genetic variants through genome-wide association studies (GWAS) in the past decade, accuracy of genetic risk prediction remains moderate for most diseases, which is largely due to the challenges in both identifying all the functionally relevant variants and accurately estimating their effect sizes. In this work, we introduce PleioPred, a principled framework that leverages pleiotropy and functional annotations in genetic risk prediction for complex diseases. PleioPred uses GWAS summary statistics as its input, and jointly models multiple genetically correlated diseases and a variety of external information including linkage disequilibrium and diverse functional annotations to increase the accuracy of risk prediction. Through comprehensive simulations and real data analyses on Crohn’s disease, celiac disease and type-II diabetes, we demonstrate that our approach can substantially increase the accuracy of polygenic risk prediction and risk population stratification, i.e. PleioPred can significantly better separate type-II diabetes patients with early and late onset ages, illustrating its potential clinical application. Furthermore, we show that the increment in prediction accuracy is significantly correlated with the genetic correlation between the predicted and jointly modeled diseases.

[1]  P. Visscher,et al.  Estimating missing heritability for disease from genome-wide association studies. , 2011, American journal of human genetics.

[2]  H. Hakonarson,et al.  Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease. , 2013, American journal of human genetics.

[3]  Christian Gieger,et al.  New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk , 2010, Nature Genetics.

[4]  Doug Speed,et al.  MultiBLUP: improved SNP-based prediction for complex traits , 2014, Genome research.

[5]  P. Deloukas,et al.  Multiple common variants for celiac disease influencing immune gene expression , 2010, Nature Genetics.

[6]  Jianxin Shi,et al.  Developing and evaluating polygenic risk prediction models for stratified disease prevention , 2016, Nature Reviews Genetics.

[7]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[8]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.

[9]  Joseph K. Pickrell Joint analysis of functional genomic data and genome-wide association studies of 18 human traits , 2013, bioRxiv.

[10]  A. Price,et al.  Dissecting the genetics of complex traits using summary association statistics , 2016, Nature Reviews Genetics.

[11]  Steven J. Schrodi,et al.  Genetic-based prediction of disease traits: prediction is very difficult, especially about the future† , 2014, Front. Genet..

[12]  P. Visscher,et al.  MTAG: Multi-Trait Analysis of GWAS , 2017, bioRxiv.

[13]  W. Willett,et al.  Breast Cancer Risk From Modifiable and Nonmodifiable Risk Factors Among White Women in the United States. , 2016, JAMA oncology.

[14]  Hongyu Zhao,et al.  Leveraging functional annotations in genetic risk prediction for human complex diseases , 2016, bioRxiv.

[15]  Manolis Kellis,et al.  Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases , 2016, Nucleic acids research.

[16]  J. Pérez-Ortín,et al.  Cytoplasmic 5′-3′ exonuclease Xrn1p is also a genome-wide transcription factor in yeast , 2013, Front. Genet..

[17]  N. Mehta Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. , 2011, Circulation. Cardiovascular genetics.

[18]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[19]  Naomi R. Wray,et al.  Commentary on “Limitations of GCTA as a solution to the missing heritability problem” , 2016, bioRxiv.

[20]  Brendan Bulik-Sullivan,et al.  Relationship between LD Score and Haseman-Elston Regression , 2015, bioRxiv.

[21]  P. Visscher,et al.  Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores , 2015, bioRxiv.

[22]  M. Daly,et al.  An Atlas of Genetic Correlations across Human Diseases and Traits , 2015, Nature Genetics.

[23]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[24]  Laura J. Scott,et al.  Joint Analysis of Psychiatric Disorders Increases Accuracy of Risk Prediction for Schizophrenia, Bipolar Disorder, and Major Depressive Disorder , 2015, American journal of human genetics.

[25]  Kei-Hoi Cheung,et al.  A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data , 2015, Scientific Reports.

[26]  Xiaoping Zhou A Unified Framework for Variance Component Estimation with Summary Statistics in Genome-wide Association Studies , 2016, bioRxiv.

[27]  Hongyu Zhao,et al.  GenoWAP: GWAS signal prioritization through integrated analysis of genomic functional annotation , 2016, Bioinform..

[28]  Tanya M. Teslovich,et al.  Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes , 2012, Nature Genetics.

[29]  Jane E. Carpenter,et al.  Prediction of Breast Cancer Risk Based on Profiling With Common Genetic Variants , 2015, JNCI Journal of the National Cancer Institute.

[30]  Hongyu Zhao,et al.  GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation , 2014, PLoS genetics.

[31]  Can Yang,et al.  Improving genetic risk prediction by leveraging pleiotropy , 2013, Human Genetics.

[32]  P. Visscher,et al.  Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits , 2012, Nature Genetics.

[33]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[34]  Andres Metspalu,et al.  Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores , 2016, Genetics in Medicine.

[35]  B. Pasaniuc,et al.  Leveraging Functional-Annotation Data in Trans-ethnic Fine-Mapping Studies. , 2015, American journal of human genetics.

[36]  Tianxi Cai,et al.  Risk Classification With an Adaptive Naive Bayes Kernel Machine Model , 2015, Journal of the American Statistical Association.

[37]  Qian Wang,et al.  Integrative Tissue-Specific Functional Annotations in the Human Genome Provide Novel Insights on Many Complex Traits and Improve Signal Prioritization in Genome Wide Association Studies , 2015, bioRxiv.