LDpred2: better, faster, stronger

Polygenic scores have become a central tool in human genetics research. LDpred is a popular method for deriving polygenic scores based on summary statistics and a matrix of correlation between genetic variants. However, LDpred has limitations that may reduce its predictive performance. Here we present LDpred2, a new version of LDpred that addresses these issues. We also provide two new options in LDpred2: a “sparse” option that can learn effects that are exactly 0, and an “auto” option that directly learns the two LDpred parameters from data. We benchmark predictive performance of LDpred2 against the previous version on simulated and real data, demonstrating substantial improvements in robustness and predictive accuracy compared to LDpred1. We then show that LDpred2 also outperforms other polygenic score methods recently developed, with a mean AUC over the 8 real traits analyzed here of 65.1%, compared to 63.8% for lassosum, 62.9% for PRS-CS and 61.5% for SBayesR. Note that, in contrast to what was recommended in the first version of this paper, we now recommend to run LDpred2 genome-wide instead of per chromosome. LDpred2 is implemented in R package bigsnpr.

[1]  A. Auton,et al.  Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets , 2021, Nature Communications.

[2]  Nikita A. Sakhanenko,et al.  Allele Frequency Mismatches and Apparent Mismappings in UK Biobank SNP Data , 2020, bioRxiv.

[3]  Henrik Bengtsson A Unifying Framework for Parallel and Distributed Processing in R using Futures , 2020, R J..

[4]  B. Neale,et al.  Non-parametric Polygenic Risk Prediction via Partitioned GWAS Summary Statistics. , 2020, American journal of human genetics.

[5]  P. Koellinger,et al.  Using genetics for social science , 2020, Nature Human Behaviour.

[6]  Kohske Takahashi,et al.  Welcome to the Tidyverse , 2019, J. Open Source Softw..

[7]  John J. McGrath,et al.  Efficient toolkit implementing best practices for principal component analysis of population genetic data , 2019, bioRxiv.

[8]  T. Werge,et al.  Association of Childhood Exposure to Nitrogen Dioxide and Polygenic Risk Score for Schizophrenia With the Risk of Developing Schizophrenia , 2019, JAMA network open.

[9]  G. Bottà,et al.  Software as a Service for the Genomic Prediction of Complex Diseases , 2019, bioRxiv.

[10]  J. Danesh,et al.  Genomic risk score offers predictive performance comparable to clinical risk factors for ischaemic stroke , 2019, Nature Communications.

[11]  P. O’Reilly,et al.  PRSice-2: Polygenic Risk Score software for biobank-scale data , 2019, GigaScience.

[12]  M. Blum,et al.  Making the most of Clumping and Thresholding for polygenic scores , 2019, bioRxiv.

[13]  M. Joyner,et al.  Polygenic Risk Scores That Predict Common Diseases Using Millions of Single Nucleotide Polymorphisms: Is More, Better? , 2019, Clinical chemistry.

[14]  A. Toland,et al.  Genetic Testing to Guide Risk-Stratified Screens for Breast Cancer , 2019, Journal of personalized medicine.

[15]  Naomi R. Wray,et al.  Improved polygenic prediction by Bayesian multiple regression on summary statistics , 2019, Nature Communications.

[16]  Jonathan P. Beauchamp,et al.  Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences , 2019, Nature Genetics.

[17]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[18]  R. Plomin,et al.  Genomic prediction of cognitive traits in childhood and adolescence , 2018, bioRxiv.

[19]  Yang Ni,et al.  Polygenic prediction via Bayesian regression and continuous shrinkage priors , 2018, Nature Communications.

[20]  Alkes L. Price,et al.  Modeling functional enrichment improves polygenic prediction accuracy in UK Biobank and 23andMe data sets , 2018, bioRxiv.

[21]  Mary E. Haas,et al.  Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations , 2018, Nature Genetics.

[22]  K. D. Sørensen,et al.  Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci , 2018, Nature Genetics.

[23]  Nicholas W. Papageorge,et al.  Genetic Endowments and Wealth Inequality , 2018, Journal of Political Economy.

[24]  Andrey Ziyatdinov,et al.  Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr , 2018, Bioinform..

[25]  Warren W. Kretzschmar,et al.  Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression , 2017, Nature Genetics.

[26]  Manuel A. R. Ferreira,et al.  Multiancestry association study identifies new asthma risk loci that colocalize with immune cell enhancer marks , 2017, Nature Genetics.

[27]  Bjarni V. Halldórsson,et al.  The nature of nurture: Effects of parental genotypes , 2017, Science.

[28]  Gary D Bader,et al.  Association analysis identifies 65 new breast cancer risk loci , 2017, Nature.

[29]  J. Todd,et al.  Childhood adiposity and risk of type 1 diabetes: A Mendelian randomization study , 2017, PLoS medicine.

[30]  Tanya M. Teslovich,et al.  An Expanded Genome-Wide Association Study of Type 2 Diabetes in Europeans , 2017, Diabetes.

[31]  Cisca Wijmenga,et al.  The MHC locus and genetic susceptibility to autoimmune and infectious diseases , 2017, Genome Biology.

[32]  H. Lachman,et al.  The Major Histocompatibility Complex (MHC) in Schizophrenia: A Review , 2016, Journal of clinical & cellular immunology.

[33]  Pak Chung Sham,et al.  Polygenic scores via penalized regression on summary statistics , 2016, bioRxiv.

[34]  B. Pasaniuc,et al.  Contrasting the genetic architecture of 30 complex traits from summary association data , 2016, bioRxiv.

[35]  P. Visscher,et al.  Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores , 2015, bioRxiv.

[36]  P. Zandi,et al.  Polygenic risk, stressful life events and depressive symptoms in older adults: a polygenic score analysis , 2014, Psychological Medicine.

[37]  S. Duffy,et al.  Implications of polygenic risk-stratified screening for prostate cancer on overdiagnosis , 2014, Genetics in Medicine.

[38]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[39]  P. Visscher,et al.  The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling , 2010, PLoS genetics.

[40]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[41]  K. Shianna,et al.  Long-range LD can confound genome scans in admixed populations. , 2008, American journal of human genetics.

[42]  D. Falconer The inheritance of liability to certain diseases, estimated from the incidence among relatives , 1965 .

[43]  C. Willenborg,et al.  A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease , 2015 .

[44]  Jun S. Liu,et al.  Genetics of rheumatoid arthritis contributes to biology and drug discovery , 2013 .