PRSice-2: Polygenic Risk Score software for biobank-scale data

Abstract Background Polygenic risk score (PRS) analyses have become an integral part of biomedical research, exploited to gain insights into shared aetiology among traits, to control for genomic profile in experimental studies, and to strengthen causal inference, among a range of applications. Substantial efforts are now devoted to biobank projects to collect large genetic and phenotypic data, providing unprecedented opportunity for genetic discovery and applications. To process the large-scale data provided by such biobank resources, highly efficient and scalable methods and software are required. Results Here we introduce PRSice-2, an efficient and scalable software program for automating and simplifying PRS analyses on large-scale data. PRSice-2 handles both genotyped and imputed data, provides empirical association P-values free from inflation due to overfitting, supports different inheritance models, and can evaluate multiple continuous and binary target traits simultaneously. We demonstrate that PRSice-2 is dramatically faster and more memory-efficient than PRSice-1 and alternative PRS software, LDpred and lassosum, while having comparable predictive power. Conclusion PRSice-2's combination of efficiency and power will be increasingly important as data sizes grow and as the applications of PRS become more sophisticated, e.g., when incorporated into high-dimensional or gene set–based analyses. PRSice-2 is written in C++, with an R script for plotting, and is freely available for download from http://PRSice.info.

[1]  R. Plomin,et al.  Erratum: Predicting educational achievement from DNA , 2017, Molecular Psychiatry.

[2]  N. Wray,et al.  Research review: Polygenic methods and their application to psychiatric traits. , 2014, Journal of child psychology and psychiatry, and allied disciplines.

[3]  P. Visscher,et al.  Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores , 2015, bioRxiv.

[4]  Dermot F. Reilly,et al.  Polygenic Risk Score Identifies Subgroup With Higher Burden of Atherosclerosis and Greater Relative Benefit From Statin Therapy in the Primary Prevention Setting , 2017, Circulation.

[5]  Ross M. Fraser,et al.  Defining the role of common variation in the genomic and biological architecture of adult human height , 2014, Nature Genetics.

[6]  Ross M. Fraser,et al.  Genetic studies of body mass index yield new insights for obesity biology , 2015, Nature.

[7]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[8]  Christopher R. Gignoux,et al.  Human demographic history impacts genetic risk prediction across diverse populations , 2016, bioRxiv.

[9]  Timothy Shin Heng Mak,et al.  Tutorial: a guide to performing polygenic risk score analyses , 2018, bioRxiv.

[10]  Jocelyn Kaiser,et al.  NIH’s 1-million-volunteer precision medicine study announces first pilot projects , 2016 .

[11]  M. Joyner,et al.  Polygenic Risk Scores That Predict Common Diseases Using Millions of Single Nucleotide Polymorphisms: Is More, Better? , 2019, Clinical chemistry.

[12]  Kyle J. Gaulton,et al.  Clustering of Type 2 Diabetes Genetic Loci by Multi-Trait Associations Identifies Disease Mechanisms and Subtypes , 2018, bioRxiv.

[13]  R Plomin,et al.  Phenome-wide analysis of genome-wide polygenic scores , 2015, Molecular Psychiatry.

[14]  G. Breen,et al.  Multi-polygenic score approach to trait prediction , 2017, Molecular Psychiatry.

[15]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[16]  Jack Euesden,et al.  PRSice: Polygenic Risk Score software , 2014, Bioinform..

[17]  P. Sham,et al.  A note on the calculation of empirical P values from Monte Carlo procedures. , 2002, American journal of human genetics.

[18]  Pak Chung Sham,et al.  Polygenic scores via penalized regression on summary statistics , 2016, bioRxiv.

[19]  C. Sudlow,et al.  Shared genetic aetiology between cognitive functions and physical and mental health in UK Biobank (N=112 151) and 24 GWAS consortia , 2015, Molecular Psychiatry.

[20]  P. O’Reilly,et al.  Association of Polygenic Risk for Attention-Deficit/Hyperactivity Disorder With Co-occurring Traits and Disorders , 2017, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[21]  R. Plomin,et al.  Genomic prediction of cognitive traits in childhood and adolescence , 2018, bioRxiv.

[22]  Yang Ni,et al.  Polygenic prediction via Bayesian regression and continuous shrinkage priors , 2018, Nature Communications.

[23]  W. Chung,et al.  Evaluation of Polygenic Risk Scores for Breast and Ovarian Cancer Risk Prediction in BRCA1 and BRCA2 Mutation Carriers , 2017, Journal of the National Cancer Institute.

[24]  P. O’Reilly,et al.  Genome-Wide Polygenic Scores Predict Reading Performance Throughout the School Years , 2017, Scientific studies of reading : the official journal of the Society for the Scientific Study of Reading.

[25]  Jane E. Carpenter,et al.  Prediction of Breast Cancer Risk Based on Profiling With Common Genetic Variants , 2015, JNCI Journal of the National Cancer Institute.

[26]  Paul A. Harris,et al.  Secondary use of clinical data: The Vanderbilt approach , 2014, J. Biomed. Informatics.

[27]  M. Feldman,et al.  Analysis of Polygenic Score Usage and Performance in Diverse Human Populations , 2018, bioRxiv.

[28]  Po-Ru Loh,et al.  Multi-ethnic polygenic risk scores improve risk prediction in diverse populations , 2016, bioRxiv.