Polygenic prediction via Bayesian regression and continuous shrinkage priors

Polygenic risk scores (PRS) have shown promise in predicting human complex traits and diseases. Here, we present PRS-CS, a polygenic prediction method that infers posterior effect sizes of single nucleotide polymorphisms (SNPs) using genome-wide association summary statistics and an external linkage disequilibrium (LD) reference panel. PRS-CS utilizes a high-dimensional Bayesian regression framework, and is distinct from previous work by placing a continuous shrinkage (CS) prior on SNP effect sizes, which is robust to varying genetic architectures, provides substantial computational advantages, and enables multivariate modeling of local LD patterns. Simulation studies using data from the UK Biobank show that PRS-CS outperforms existing methods across a wide range of genetic architectures, especially when the training sample size is large. We apply PRS-CS to predict six common complex diseases and six quantitative traits in the Partners HealthCare Biobank, and further demonstrate the improvement of PRS-CS in prediction accuracy over alternative methods.Polygenic risk scores (PRS) have the potential to predict complex diseases and traits from genetic data. Here, Ge et al. develop PRS-CS which uses a Bayesian regression framework, continuous shrinkage (CS) priors and an external LD reference panel for polygenic prediction of binary and quantitative traits from GWAS summary statistics.

[1]  W. Strawderman Proper Bayes Minimax Estimators of the Multivariate Normal Mean , 1971 .

[2]  J. Berger A Robust Generalized Bayes Estimator and Confidence Region for a Multivariate Normal Mean , 1980 .

[3]  F. Hole The Europeans , 1983, Nature.

[4]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[5]  Nengjun Yi,et al.  Stochastic search variable selection for identifying multiple quantitative trait loci. , 2003, Genetics.

[6]  Shizhong Xu Estimating polygenic effects using markers of the entire genome. , 2003, Genetics.

[7]  M. Goddard,et al.  Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data , 2004, Genetics Selection Evolution.

[8]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[9]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[10]  C. Hoggart,et al.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies , 2008, PLoS genetics.

[11]  Arnaud Doucet,et al.  Sparse Bayesian nonparametric regression , 2008, ICML '08.

[12]  N. Yi,et al.  Bayesian LASSO for Quantitative Trait Loci Mapping , 2008, Genetics.

[13]  N. Yi,et al.  Bayesian LASSO for QTL Mapping , 2008 .

[14]  José Crossa,et al.  Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree , 2009, Genetics.

[15]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[16]  Michael E Goddard,et al.  Accuracy of genomic selection using stochastic search variable selection in Australian Holstein Friesian dairy cattle. , 2009, Genetics research.

[17]  M. Goddard,et al.  Genetic Architecture of Complex Traits and Accuracy of Genomic Prediction: Coat Colour, Milk-Fat Percentage, and Type in Holstein Cattle as Contrasting Model Traits , 2010, PLoS genetics.

[18]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[19]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[20]  J. Griffin,et al.  Inference with normal-gamma prior distributions in regression problems , 2010 .

[21]  Rohan L. Fernando,et al.  Extension of the bayesian alphabet for genomic selection , 2011, BMC Bioinformatics.

[22]  Michael E Goddard,et al.  Sensitivity of genomic selection to using different prior distributions , 2010, BMC proceedings.

[23]  J. Griffin,et al.  BAYESIAN HYPER‐LASSOS WITH NON‐CONVEX PENALIZATION , 2011 .

[24]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[25]  David B. Dunson,et al.  Generalized Beta Mixtures of Gaussians , 2011, NIPS.

[26]  D. Allison,et al.  Beyond Missing Heritability: Prediction of Complex Traits , 2011, PLoS genetics.

[27]  P. Visscher,et al.  Estimating missing heritability for disease from genome-wide association studies. , 2011, American journal of human genetics.

[28]  P. Visscher,et al.  Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits , 2012, Nature Genetics.

[29]  Arnaud Doucet,et al.  Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors , 2011, Statistical applications in genetics and molecular biology.

[30]  James G. Scott,et al.  Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction , 2022 .

[31]  M Erbe,et al.  Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. , 2012, Journal of dairy science.

[32]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[33]  Jaeyong Lee,et al.  GENERALIZED DOUBLE PARETO SHRINKAGE. , 2011, Statistica Sinica.

[34]  Tanya M. Teslovich,et al.  Discovery and refinement of loci associated with lipid levels , 2013, Nature Genetics.

[35]  Jun S. Liu,et al.  Genetics of rheumatoid arthritis contributes to biology and drug discovery , 2013 .

[36]  Judy H. Cho,et al.  Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations , 2015, Nature Genetics.

[37]  J. Danesh,et al.  A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease , 2016 .

[38]  P. Visscher,et al.  Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores , 2015, bioRxiv.

[39]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[40]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[41]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[42]  P. Visscher,et al.  Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model , 2015, PLoS genetics.

[43]  Jack Euesden,et al.  PRSice: Polygenic Risk Score software , 2014, Bioinform..

[44]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[45]  Nich Wattanasin,et al.  The Biobank Portal for Partners Personalized Medicine: A Query Tool for Working with Consented Biobank Samples, Genotypes, and Phenotypes Using i2b2 , 2016, Journal of personalized medicine.

[46]  E. Karlson,et al.  Building the Partners HealthCare Biobank at Partners Personalized Medicine: Informed Consent, Return of Research Results, Recruitment Lessons and Operational Considerations , 2016, Journal of personalized medicine.

[47]  B. Pasaniuc,et al.  Contrasting the genetic architecture of 30 complex traits from summary association data , 2016, bioRxiv.

[48]  Shane A. McCarthy,et al.  Reference-based phasing using the Haplotype Reference Consortium panel , 2016, Nature Genetics.

[49]  Jianxin Shi,et al.  Developing and evaluating polygenic risk prediction models for stratified disease prevention , 2016, Nature Reviews Genetics.

[50]  Joseph K. Pickrell,et al.  Approximately independent linkage disequilibrium blocks in human populations , 2015, bioRxiv.

[51]  X. Hua,et al.  Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data , 2016, bioRxiv.

[52]  Sam Clark,et al.  Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship , 2017, bioRxiv.

[53]  P. Visscher,et al.  Multi-trait analysis of genome-wide association summary statistics using MTAG , 2017, Nature Genetics.

[54]  Tanya M. Teslovich,et al.  An Expanded Genome-Wide Association Study of Type 2 Diabetes in Europeans , 2017, Diabetes.

[55]  Bogdan Pasaniuc,et al.  Local genetic correlation gives insights into the shared genetic architecture of complex traits , 2016, bioRxiv.

[56]  Gary D Bader,et al.  Association analysis identifies 65 new breast cancer risk loci , 2017, Nature.

[57]  Xiang Zhou,et al.  Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models , 2017, Nature Communications.

[58]  Aki Vehtari,et al.  On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior , 2016, AISTATS.

[59]  A. Price,et al.  Dissecting the genetics of complex traits using summary association statistics , 2016, Nature Reviews Genetics.

[60]  M. Pirinen,et al.  Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies. , 2017, American journal of human genetics.

[61]  Alkes L. Price,et al.  Modeling functional enrichment improves polygenic prediction accuracy in UK Biobank and 23andMe data sets , 2018, bioRxiv.

[62]  P. Visscher,et al.  Meta-analysis of genome-wide association studies for height and body mass index in ∼700,000 individuals of European ancestry , 2018, bioRxiv.

[63]  P. Visscher,et al.  Meta-analysis of genome-wide association studies for height and body mass index in ∼700,000 individuals of European ancestry , 2018, bioRxiv.

[64]  N. Wray,et al.  Estimation of Genetic Correlation via Linkage Disequilibrium Score Regression and Genomic Restricted Maximum Likelihood. , 2018, American journal of human genetics.

[65]  Warren W. Kretzschmar,et al.  Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression , 2017, Nature Genetics.

[66]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[67]  Nilanjan Chatterjee,et al.  Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits , 2018, Nature Genetics.

[68]  Mary E. Haas,et al.  Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations , 2018, Nature Genetics.

[69]  Andres Metspalu,et al.  Improved polygenic prediction by Bayesian multiple regression on summary statistics , 2019, Nature Communications.