Large uncertainty in individual PRS estimation impacts PRS-based risk stratification

Large-scale genome-wide association studies have enabled polygenic risk scores (PRS), which estimate the genetic value of an individual for a given trait. Since PRS accuracy is typically assessed using cohort-level metrics (e.g., R2), uncertainty in PRS estimates at individual level remains underexplored. Here we show that Bayesian PRS methods can estimate the variance of an individual’s PRS and can yield well-calibrated credible intervals for the genetic value of a single individual. For real traits in the UK Biobank (N=291,273 unrelated “white British”) we observe large variance in individual PRS estimates which impacts interpretation of PRS-based stratification; for example, averaging across 13 traits, only 0.8% (s.d. 1.6%) of individuals with PRS point estimates in the top decile have their entire 95% credible intervals fully contained in the top decile. We provide an analytical estimator for individual PRS variance—a function of SNP-heritability, number of causal SNPs, and sample size—and observe high concordance with individual variances estimated via posterior sampling. Finally as an example of the utility of individual PRS uncertainties, we explore a probabilistic approach to PRS-based stratification that estimates the probability of an individual’s genetic value to be above a prespecified threshold. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses.

[1]  Alec M. Chiu,et al.  Quantifying the contribution of dominance effects to complex trait variation in biobank-scale data , 2020, bioRxiv.

[2]  P. Visscher,et al.  Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals , 2020, bioRxiv.

[3]  Blaine R. Roberts,et al.  Risk prediction of late-onset Alzheimer’s disease implies an oligogenic architecture , 2020, Nature Communications.

[4]  A. Khera,et al.  Genome-Wide Polygenic Score, Clinical Risk Factors, and Long-Term Trajectories of Coronary Artery Disease , 2020, Arteriosclerosis, thrombosis, and vascular biology.

[5]  Matthew S. Lebo,et al.  Polygenic background modifies penetrance of monogenic variants for tier 1 genomic conditions , 2020, Nature Communications.

[6]  M. García-Closas,et al.  Combined Utility of 25 Disease and Risk Factor Polygenic Risk Scores for Stratifying Risk of All-Cause Mortality. , 2020, American journal of human genetics.

[7]  Shing Wan Choi,et al.  Tutorial: a guide to performing polygenic risk score analyses , 2020, Nature Protocols.

[8]  Frank Dudbridge,et al.  Criteria for evaluating risk prediction of multiple outcomes , 2020, Statistical methods in medical research.

[9]  Type 1 diabetes genetic risk score is discriminative of diabetes in non-Europeans: evidence from a study in India , 2020, Scientific Reports.

[10]  Bjarni J. Vilhjálmsson,et al.  LDpred2: better, faster, stronger , 2020, bioRxiv.

[11]  Jason H. Moore,et al.  Electronic health records and polygenic risk scores for predicting disease risk , 2020, Nature Reviews Genetics.

[12]  P. Visscher,et al.  Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations , 2020, Nature Communications.

[13]  Na Cai,et al.  A Robust Method Uncovers Significant Context-Specific Heritability in Diverse Complex Traits. , 2020, American journal of human genetics.

[14]  M. Inouye,et al.  Towards clinical utility of polygenic risk scores. , 2019, Human molecular genetics.

[15]  Hongbing Shen,et al.  Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations. , 2019, The Lancet. Respiratory medicine.

[16]  Shing Wan Choi,et al.  PRSice-2: Polygenic Risk Score software for biobank-scale data , 2019, GigaScience.

[17]  Stephanie A. Bien,et al.  Genetic analyses of diverse populations improves discovery for complex traits , 2019, Nature.

[18]  Anne-Laure Boulesteix,et al.  Sampling uncertainty versus method uncertainty: A general framework with applications to omics biomarker selection , 2019, Biometrical journal. Biometrische Zeitschrift.

[19]  L. Sugrue,et al.  What Are Polygenic Scores and Why Are They Important? , 2019, JAMA.

[20]  Matthew S. Lebo,et al.  Polygenic Prediction of Weight and Obesity Trajectories from Birth to Adulthood , 2019, Cell.

[21]  Alicia R. Martin,et al.  Clinical use of current polygenic risk scores may exacerbate health disparities , 2019, Nature Genetics.

[22]  Naomi R. Wray,et al.  Improved polygenic prediction by Bayesian multiple regression on summary statistics , 2019, Nature Communications.

[23]  M. García-Closas,et al.  BOADICEA: a comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors , 2019, Genetics in Medicine.

[24]  P. Visscher,et al.  Genotype-by-environment interactions inferred from genetic effects on phenotypic variability in the UK Biobank , 2019, Science Advances.

[25]  Kristen S Purrington,et al.  Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes , 2018, American Journal of Human Genetics.

[26]  Kelsey E. Grinde,et al.  Generalizing polygenic risk scores from Europeans to Hispanics/Latinos , 2018, Genetic epidemiology.

[27]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[28]  Timothy Shin Heng Mak,et al.  Tutorial: a guide to performing polygenic risk score analyses , 2018, bioRxiv.

[29]  Yang Ni,et al.  Polygenic prediction via Bayesian regression and continuous shrinkage priors , 2018, Nature Communications.

[30]  E. Topol,et al.  The personal and clinical utility of polygenic risk scores , 2018, Nature Reviews Genetics.

[31]  Mary E. Haas,et al.  Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations , 2018, Nature Genetics.

[32]  Stephanie E. Moser,et al.  Association of Polygenic Risk Scores for Multiple Cancers in a Phenome-wide Study: Results from The Michigan Genomics Initiative , 2017, bioRxiv.

[33]  D. Schaid,et al.  From genome-wide associations to candidate causal variants by statistical fine-mapping , 2018, Nature Reviews Genetics.

[34]  Luke R. Lloyd-Jones,et al.  Signatures of negative selection in the genetic architecture of human complex traits , 2018, Nature Genetics.

[35]  O. Andreassen,et al.  Polygenic hazard score to guide screening for aggressive prostate cancer: development and validation in large scale cohorts , 2018, British Medical Journal.

[36]  Christopher R. Gignoux,et al.  Human demographic history impacts genetic risk prediction across diverse populations , 2016, bioRxiv.

[37]  Pak Chung Sham,et al.  Polygenic scores via penalized regression on summary statistics , 2016, bioRxiv.

[38]  W. Chung,et al.  Evaluation of Polygenic Risk Scores for Breast and Ovarian Cancer Risk Prediction in BRCA1 and BRCA2 Mutation Carriers , 2017, Journal of the National Cancer Institute.

[39]  Dermot F. Reilly,et al.  Polygenic Risk Score Identifies Subgroup With Higher Burden of Atherosclerosis and Greater Relative Benefit From Statin Therapy in the Primary Prevention Setting , 2017, Circulation.

[40]  Andres Metspalu,et al.  Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores , 2016, Genetics in Medicine.

[41]  Jianxin Shi,et al.  Developing and evaluating polygenic risk prediction models for stratified disease prevention , 2016, Nature Reviews Genetics.

[42]  Hongyu Zhao,et al.  Leveraging functional annotations in genetic risk prediction for human complex diseases , 2016, bioRxiv.

[43]  P. Visscher,et al.  Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores , 2015, bioRxiv.

[44]  J. Hickey,et al.  Reliability of pedigree-based and genomic evaluations in selected populations , 2015, Genetics Selection Evolution.

[45]  P. Visscher,et al.  Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model , 2015, PLoS genetics.

[46]  Jaime E Hart,et al.  The association of long-term exposure to PM2.5 on all-cause mortality in the Nurses’ Health Study and the impact of measurement-error correction , 2014, Environmental Health.

[47]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[48]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[49]  N. Wray,et al.  Research review: Polygenic methods and their application to psychiatric traits. , 2014, Journal of child psychology and psychiatry, and allied disciplines.

[50]  Doug Speed,et al.  MultiBLUP: improved SNP-based prediction for complex traits , 2014, Genome research.

[51]  Kathleen F. Kerr,et al.  Net reclassification indices for evaluating risk prediction instruments: a critical review. , 2014, Epidemiology.

[52]  P. Visscher,et al.  Pitfalls of predicting complex traits from SNPs , 2013, Nature Reviews Genetics.

[53]  Daniel Gianola,et al.  "Likelihood, Bayesian, and Mcmc Methods in Quantitative Genetics" , 2010 .

[54]  D. Easton,et al.  Evaluating the power to discriminate between highly correlated SNPs in genetic association studies , 2010, Genetic epidemiology.

[55]  B. Guldbrandtsen,et al.  Preliminary investigation on reliability of genomic estimated breeding values in the Danish Holstein population. , 2010, Journal of dairy science.

[56]  Gregory C. Colati,et al.  Better, Faster, Stronger , 2009 .

[57]  Estimation of prediction error variances via Monte Carlo sampling methods using different formulations of the prediction error variance , 2009, Genetics Selection Evolution.

[58]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[59]  K. Meyer,et al.  Approximating prediction error covariances among additive genetic effects within animals in multiple‐trait and random regression models , 2004 .

[60]  Julian J. Faraway,et al.  Practical Regression and Anova using R , 2002 .

[61]  G. Jansen,et al.  Approximate accuracies of prediction from random regression models , 2000 .

[62]  W. Ewens Genetics and analysis of quantitative traits , 1999 .

[63]  Charldean Newell,et al.  Better, Faster, Stronger , 1999 .

[64]  J. M. Taylor,et al.  A comparison of smoothing techniques for CD4 data measured with error in a time-dependent Cox proportional hazards model. , 1998, Statistics in medicine.

[65]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[66]  Karin Meyer,et al.  Approximate Accuracy of Genetic Evaluation under an Animal Model , 1989 .

[67]  Ignacy Misztal,et al.  Approximation of Prediction Error Variance in Large-Scale Animal Models , 1988 .

[68]  C. R. Henderson,et al.  Best linear unbiased estimation and prediction under a selection model. , 1975, Biometrics.

[69]  D.,et al.  Regression Models and Life-Tables , 2022 .