Resource profile and user guide of the Polygenic Index Repository

Polygenic indexes (PGIs) are DNA-based predictors. Their value for research in many scientific disciplines is rapidly growing. As a resource for researchers, we used a consistent methodology to construct PGIs for 47 phenotypes in 11 datasets. To maximize the PGIs’ prediction accuracies, we constructed them using genome-wide association studies—some of which are novel—from multiple data sources, including 23andMe and UK Biobank. We present a theoretical framework to help interpret analyses involving PGIs. A key insight is that a PGI can be understood as an unbiased but noisy measure of a latent variable we call the “additive SNP factor.” Regressions in which the true regressor is the additive SNP factor but the PGI is used as its proxy therefore suffer from errors-in-variables bias. We derive an estimator that corrects for the bias, illustrate the correction, and make a Python tool for implementing it publicly available.

Andrew Steptoe | Jonathan P. Beauchamp | Daniel J. Benjamin | Philipp Koellinger | David Laibson | Aysu Okbay | Peter M. Visscher | Richie Poulton | Avshalom Caspi | Terrie E. Moffitt | William G. Iacono | Magnus Johannesson | Tõnu Esko | Patrick Turley | Richard Karlsson Linnér | Matt McGue | Jeremy Freese | Daniel W. Belsky | Elliot M. Tucker-Drob | David Cesarini | David A. Hinds | K. Paige Harden | Lili Milani | David L. Corcoran | Olesya Ajnakina | Joel Becker | Casper A.P. Burik | Grant Goldman | Nancy Wang | Hariharan Jayashankar | Michael Bennett | Rafael Ahlskog | Aaron Kleinman | Karen Sugden | Benjamin S. Williams | Kathleen Mullan Harris | Patrik K.E. Magnusson | Travis T. Mallard | Pamela Herd | Alexander Young | Sven Oskarsson | Michelle N. Meyer | D. Belsky | A. Caspi | D. Corcoran | K. Sugden | B. Williams | R. Poulton | T. Moffitt | P. Visscher | A. Auton | V. Vacic | B. Alipanahi | K. Bryc | M. Johannesson | S. Shringarpure | David I. Laibson | D. Hinds | W. Iacono | T. Esko | J. Tung | J. Mountain | L. Milani | P. Magnusson | N. Furlotte | A. Steptoe | J. Freese | M. McGue | M. Meyer | D. Benjamin | D. Cesarini | P. Koellinger | K. Harris | P. Fontanillas | P. Turley | A. Okbay | R. Karlsson Linnér | A. Kleinman | P. Herd | Sven Oskarsson | E. Tucker-Drob | C. Tian | O. Ajnakina | S. Pitts | S. Elson | N. Litterman | J. Shelton | R. Linnér | J. Sathirapongsasuti | M. McIntyre | K. Harden | C. Burik | T. Mallard | M. Agee | R. Bell | K. Huber | C. Northover | J. McCreight | Rafael Ahlskog | Alexander I Young | Hariharan Jayashankar | Grant Goldman | Michelle Babak Adam Robert K. Katarzyna Sarah L. Pierre Nic Agee Alipanahi Auton Bell Bryc Elson Fon | O. V. Sazonova | Joel Becker | Nancy Wang | Michael Bennett | C. H. Wilson | C. Wilson | O. Sazonova | J. Beauchamp | David Laibson | B. Williams | Casper A. P. Burik | M. Mcintyre | Benjamin S. Williams

[1]  S. A. Lambert,et al.  The Polygenic Score Catalog: an open database for reproducibility and systematic evaluation , 2020, medRxiv.

[2]  D. Belsky,et al.  Genetic associations with mathematics tracking and persistence in secondary school , 2020, NPJ science of learning.

[3]  Brendan P. Zietsch,et al.  Genetic correlates of social stratification in Great Britain , 2019, Nature Human Behaviour.

[4]  D. Belsky,et al.  Genetic associations with mathematics tracking and persistence in secondary school , 2019, npj Science of Learning.

[5]  P. Sachs,et al.  SMARCAD1 ATPase activity is required to silence endogenous retroviruses in embryonic stem cells , 2019, Nature Communications.

[6]  Genetic,et al.  Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing , 2019, Nature Genetics.

[7]  Eden R Martin,et al.  Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing , 2019, Nature Genetics.

[8]  P. Visscher,et al.  Complex Trait Prediction from Genome Data: Contrasting EBV in Livestock to PRS in Humans , 2019, Genetics.

[9]  Samuel E. Jones,et al.  Genome-wide association analyses of chronotype in 697,828 individuals provides insights into circadian rhythms , 2019, Nature Communications.

[10]  D. Belsky,et al.  Phenotypic Annotation: Using Polygenic Scores to Translate Discoveries From Genome-Wide Association Studies From the Top Down , 2019, Current directions in psychological science.

[11]  Jonathan P. Beauchamp,et al.  Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences , 2019, Nature Genetics.

[12]  H. de Wit,et al.  Genome-wide association study of Alcohol Use Disorder Identification Test (AUDIT) scores in 20,328 research participants of European ancestry , 2017, bioRxiv.

[13]  Dajiang J. Liu,et al.  Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use , 2018, Nature Genetics.

[14]  R. Marioni,et al.  Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions , 2018, Nature Neuroscience.

[15]  M. Kunitski,et al.  Double-slit photoelectron interference in strong-field ionization of the neon dimer , 2018, Nature Communications.

[16]  C. Lindgren,et al.  GWAS identifies 14 loci for device-measured physical activity and sleep duration , 2018, Nature Communications.

[17]  Alicia R. Martin,et al.  Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder , 2018, Nature Genetics.

[18]  Yang Ni,et al.  Polygenic prediction via Bayesian regression and continuous shrinkage priors , 2018, Nature Communications.

[19]  Zachary F. Gerring,et al.  GWAS of lifetime cannabis use reveals new risk loci, genetic overlap with psychiatric traits, and a causal effect of schizophrenia liability , 2018, Nature Neuroscience.

[20]  J. Freese The Arrival of Social Science Genomics , 2018, Contemporary Sociology: A Journal of Reviews.

[21]  Jonathan P. Beauchamp,et al.  Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals , 2018, Nature Genetics.

[22]  S. Linnarsson,et al.  Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways , 2018, Nature Genetics.

[23]  Tyrone D. Cannon,et al.  Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence , 2018, Nature Genetics.

[24]  Mary E. Haas,et al.  Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations , 2018, Nature Genetics.

[25]  P. Visscher,et al.  Imprint of assortative mating on the human genome , 2018, Nature Human Behaviour.

[26]  D. Conley,et al.  Geographic Clustering of Polygenic Scores at Different Stages of the Life Course , 2018, RSF.

[27]  Annchen R. Knodt,et al.  A Polygenic Score for Higher Educational Attainment is Associated with Larger Brains , 2018, bioRxiv.

[28]  P. Visscher,et al.  Meta-analysis of genome-wide association studies for height and body mass index in ∼700,000 individuals of European ancestry , 2018, bioRxiv.

[29]  Warren W. Kretzschmar,et al.  Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression , 2017, Nature Genetics.

[30]  D. Hasselquist,et al.  No evidence that carotenoid pigments boost either immune or antioxidant defenses in a songbird , 2018, Nature Communications.

[31]  P. Koellinger,et al.  Genetic instrumental variable regression: Explaining socioeconomic and health outcomes in nonexperimental data , 2017, Proceedings of the National Academy of Sciences.

[32]  T. Bourgeron,et al.  Genome-wide analyses of self-reported empathy: correlations with autism, schizophrenia, and anorexia nervosa , 2017, bioRxiv.

[33]  P. Visscher,et al.  Multi-trait analysis of genome-wide association summary statistics using MTAG , 2017, Nature Genetics.

[34]  Pierre Fontanillas,et al.  Genome-wide association study of delay discounting in 23,217 adult research participants of European ancestry , 2017, Nature Neuroscience.

[35]  Manuel A. R. Ferreira,et al.  Shared genetic origin of asthma, hay fever and eczema elucidates allergic disease biology , 2017, Nature Genetics.

[36]  Elliot M. Tucker-Drob,et al.  Measurement Error Correction of Genome-Wide Polygenic Scores in Prediction Samples , 2017, bioRxiv.

[37]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[38]  Christopher R. Gignoux,et al.  Human demographic history impacts genetic risk prediction across diverse populations , 2016, bioRxiv.

[39]  P. Visscher,et al.  Genetics and educational attainment , 2017, npj Science of Learning.

[40]  H. Stefánsson,et al.  Selection against variants in the genome associated with educational attainment , 2017, Proceedings of the National Academy of Sciences.

[41]  Tyrone D. Cannon,et al.  GWAS meta-analysis reveals novel loci and genetic correlates for general cognitive function: a report from the COGENT consortium , 2017, Molecular Psychiatry.

[42]  Tanya M. Teslovich,et al.  Genetic evidence of assortative mating in humans , 2017, Nature Human Behaviour.

[43]  Nicholas J Timpson,et al.  Association between polygenic risk scores for attention-deficit hyperactivity disorder and educational and cognitive outcomes in the general population , 2016, International journal of epidemiology.

[44]  Chi-Hua Chen,et al.  Genome-wide analyses for personality traits identify six genomic loci and show correlations with psychiatric disorders , 2016, Nature Genetics.

[45]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[46]  N. Eriksson,et al.  Genome-wide analysis identifies 12 loci influencing human reproductive behavior , 2016 .

[47]  D. Hinds,et al.  Identification of 15 genetic loci associated with risk of major depression in individuals of European descent , 2016, Nature Genetics.

[48]  D. Belsky,et al.  The Genetics of Success , 2016, Psychological science.

[49]  Joseph K. Pickrell,et al.  Detection and interpretation of shared genetic influences on 42 human traits , 2015, Nature Genetics.

[50]  Jonathan P. Beauchamp,et al.  Genetic evidence for natural selection in humans in the contemporary United States , 2016, Proceedings of the National Academy of Sciences.

[51]  Hong-Wei Xue,et al.  Arabidopsis PROTEASOME REGULATOR1 is required for auxin-mediated suppression of proteasome activity and regulates auxin signalling , 2016, Nature Communications.

[52]  Jonathan P. Beauchamp,et al.  Genetic variants associated with subjective well-being, depressive symptoms and neuroticism identified through genome-wide analyses , 2016, Nature Genetics.

[53]  Jonathan P. Beauchamp,et al.  Genome-wide association study identifies 74 loci associated with educational attainment , 2016, Nature.

[54]  T. Spector,et al.  Genome-wide association study of lifetime cannabis use based on a large meta-analytic sample of 32 330 subjects from the International Cannabis Consortium , 2016, Translational Psychiatry.

[55]  N. Eriksson,et al.  GWAS of 89,283 individuals identifies genetic variants associated with self-reporting of being a morning person , 2016, Nature Communications.

[56]  Toshiko Tanaka,et al.  Meta-analysis of Genome-Wide Association Studies for Extraversion: Findings from the Genetics of Personality Consortium , 2015, Behavior Genetics.

[57]  D. Belsky,et al.  The Genetics of Success: How Single- Nucleotide Polymorphisms Associated With Educational Attainment Relate to Life-Course Development , 2016 .

[58]  J. Murabito,et al.  Shared genetic aetiology of puberty timing between sexes and with health-related outcomes , 2015, Nature Communications.

[59]  P. Visscher,et al.  Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores , 2015, bioRxiv.

[60]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[61]  Daniel E. Adkins,et al.  Meta-analysis of Genome-wide Association Studies for Neuroticism, and the Polygenic Association With Major Depressive Disorder. , 2015, JAMA psychiatry.

[62]  M. Daly,et al.  An Atlas of Genetic Correlations across Human Diseases and Traits , 2015, Nature Genetics.

[63]  N. Wray,et al.  Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance components analysis , 2015, Nature Genetics.

[64]  Ross M. Fraser,et al.  Genetic studies of body mass index yield new insights for obesity biology , 2015, Nature.

[65]  S. O’Brien,et al.  SmileFinder: a resampling-based approach to evaluate signatures of selection from genome-wide sets of matching allele frequency data in two or more diploid populations , 2015, GigaScience.

[66]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[67]  B. Berger,et al.  Efficient Bayesian mixed model analysis increases association power in large cohorts , 2014, Nature Genetics.

[68]  Angela A. Hung,et al.  Evidence from the Health and Retirement Study: Interim Report , 2015 .

[69]  N. Eriksson,et al.  Replicability and Robustness of Genome-Wide-Association Studies for Behavioral Traits , 2014, Psychological science.

[70]  Ross M. Fraser,et al.  Defining the role of common variation in the genomic and biological architecture of adult human height , 2014, Nature Genetics.

[71]  Andrew D. Johnson,et al.  Parent-of-origin specific allelic associations among 106 genomic loci for age at menarche , 2014, Nature.

[72]  Zoltán Kutalik,et al.  Quality control and conduct of genome-wide association meta-analyses , 2014, Nature Protocols.

[73]  P. Visscher,et al.  Pitfalls of predicting complex traits from SNPs , 2013, Nature Reviews Genetics.

[74]  Chuong B. Do,et al.  A genome-wide association meta-analysis of self-reported allergy identifies shared and allergy-specific susceptibility loci , 2013, Nature Genetics.

[75]  Jonathan P. Beauchamp,et al.  GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment , 2013, Science.

[76]  Jonathan P. Beauchamp,et al.  The Promises and Pitfalls of Genoeconomics* , 2012, Annual review of economics.

[77]  Lorna M. Lopez,et al.  Meta-analysis of genome-wide association studies for personality , 2012, Molecular Psychiatry.

[78]  J. Hewitt,et al.  Editorial Policy on Candidate Gene Association and Candidate Gene-by-Environment Interaction Studies of Complex Traits , 2012, Behavior genetics.

[79]  Matthew C Keller,et al.  A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry. , 2011, The American journal of psychiatry.

[80]  M. Guyer,et al.  Charting a course for genomic medicine from base pairs to bedside , 2011, Nature.

[81]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[82]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[83]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[84]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[85]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[86]  Ming D. Li,et al.  Genome-wide meta-analyses identify multiple loci associated with smoking behavior , 2010, Nature Genetics.

[87]  C. E. Pearson,et al.  Table S2: Trans-factors and trinucleotide repeat instability Trans-factor , 2010 .

[88]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[89]  Hans D. Daetwyler,et al.  Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach , 2008, PloS one.

[90]  Peter M Visscher,et al.  Prediction of individual genetic risk to disease from genome-wide association studies. , 2007, Genome research.

[91]  M. Hughes,et al.  Regression dilution in the proportional hazards model. , 1993, Biometrics.

[92]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error. , 1992, American journal of epidemiology.

[93]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.