Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model

Estimating the polygenicity (proportion of causally associated single nucleotide polymorphisms (SNPs)) and discover-ability (effect size variance) of causal SNPs for human traits is currently of considerable interest. SNP-heritability is proportional to the product of these quantities. We present a basic model, using detailed linkage disequilibrium structure from an extensive reference panel, to estimate these quantities from genome-wide association studies (GWAS) summary statistics. We apply the model to diverse phenotypes and validate the implementation with simulations. We find model polygenicities ranging from ≃ 2 × 10−5 to ≃ 4 × 10−3, with discoverabilities similarly ranging over two orders of magnitude. A power analysis allows us to estimate the proportions of phenotypic variance explained additively by causal SNPs reaching genome-wide significance at current sample sizes, and map out sample sizes required to explain larger portions of additive SNP heritability. The model also allows for estimating residual inflation (or deflation from over-correcting of z-scores), and assessing compatibility of replication and discovery GWAS summary statistics. Author Summary There are ∼10 million common variants in the genome of humans with European ancestry. For any particular phenotype a number of these variants will have some causal effect. It is of great interest to be able to quantify the number of these causal variants and the strength of their effect on the phenotype. Genome wide association studies (GWAS) produce very noisy summary statistics for the association between subsets of common variants and phenotypes. For any phenotype, these statistics collectively are difficult to interpret, but buried within them is the true landscape of causal effects. In this work, we posit a probability distribution for the causal effects, and assess its validity using simulations. Using a detailed reference panel of ∼11 million common variants – among which only a small fraction are likely to be causal, but allowing for non-causal variants to show an association with the phenotype due to correlation with causal variants – we implement an exact procedure for estimating the number of causal variants and their mean strength of association with the phenotype. We find that, across different phenotypes, both these quantities – whose product allows for lower bound estimates of heritability – vary by orders of magnitude.

[1]  Jakob Grove,et al.  Genome-wide association study identifies 30 Loci Associated with Bipolar Disorder , 2017, bioRxiv.

[2]  Robert M. Maier,et al.  Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies , 2019, eLife.

[3]  Timothy J. Hohman,et al.  Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk , 2019, Nature Genetics.

[4]  Doug Speed,et al.  SumHer better estimates the SNP heritability of complex traits from summary statistics , 2018, Nature Genetics.

[5]  A. Isaacs,et al.  C9orf72-mediated ALS and FTD: multiple pathways to disease , 2018, Nature Reviews Neurology.

[6]  B. Tian,et al.  The C9ORF72 Gene, Implicated in Amyotrophic Lateral Sclerosis and Frontotemporal Dementia, Encodes a Protein That Functions in Control of Endothelin and Glutamate Signaling , 2018, Molecular and Cellular Biology.

[7]  Nilanjan Chatterjee,et al.  Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits , 2018, Nature Genetics.

[8]  Jonathan P. Beauchamp,et al.  Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals , 2018, Nature Genetics.

[9]  G. Coop,et al.  Reduced signal for polygenic adaptation of height in UK Biobank , 2018, bioRxiv.

[10]  Tyrone D. Cannon,et al.  Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence , 2018, Nature Genetics.

[11]  Luke R. Lloyd-Jones,et al.  Signatures of negative selection in the genetic architecture of human complex traits , 2018, Nature Genetics.

[12]  P. Visscher,et al.  Meta-analysis of genome-wide association studies for height and body mass index in ∼700,000 individuals of European ancestry , 2018, bioRxiv.

[13]  Tracy Peters,et al.  Prevalence of Amyotrophic Lateral Sclerosis — United States, 2014 , 2018, MMWR. Morbidity and mortality weekly report.

[14]  W. M. van der Flier,et al.  Genetic meta-analysis identifies 9 novel loci and functional pathways for Alzheimer’s disease risk , 2018, bioRxiv.

[15]  Warren W. Kretzschmar,et al.  Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression , 2017, Nature Genetics.

[16]  Robert Plomin,et al.  The new genetics of intelligence , 2018, Nature Reviews Genetics.

[17]  Jian Yang,et al.  Concepts, estimation and interpretation of SNP-based heritability , 2017, Nature Genetics.

[18]  Yang I Li,et al.  An Expanded View of Complex Traits: From Polygenic to Omnigenic , 2017, Cell.

[19]  O. Andreassen,et al.  Estimating degree of polygenicity, causal effect size variance, and confounding bias in GWAS summary statistics , 2017 .

[20]  Robert Plomin,et al.  Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence , 2017, Nature Genetics.

[21]  Rasool Tahmasbi,et al.  Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits , 2017, Nature Genetics.

[22]  Lilah M. Besser,et al.  Genetic assessment of age-associated Alzheimer disease risk: Development and validation of a polygenic hazard score , 2017, PLoS medicine.

[23]  P. Visscher,et al.  Genetics and educational attainment , 2017, npj Science of Learning.

[24]  Doug Speed,et al.  Re-evaluation of SNP heritability in complex human traits , 2016, Nature Genetics.

[25]  A. Price,et al.  Dissecting the genetics of complex traits using summary association statistics , 2016, Nature Reviews Genetics.

[26]  David C. Wilson,et al.  Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease , 2016, Nature Genetics.

[27]  Xiang Zhu,et al.  Bayesian large-scale multiple regression with summary statistics from genome-wide association studies , 2016, bioRxiv.

[28]  Alessandro Bertolino,et al.  Translating genome-wide association findings into new therapeutics for psychiatry , 2016, Nature Neuroscience.

[29]  Annelot M. Dekker,et al.  Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis , 2017 .

[30]  Alejandro Lucia,et al.  Epidemiology of coronary heart disease and acute coronary syndrome. , 2016, Annals of translational medicine.

[31]  Tom R. Gaunt,et al.  LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis , 2016, bioRxiv.

[32]  Kevin L. Boehme,et al.  Assessment of the genetic variance of late-onset Alzheimer's disease , 2016, Neurobiology of Aging.

[33]  Alzheimer’s Association,et al.  2016 Alzheimer's disease facts and figures , 2016, Alzheimer's & Dementia.

[34]  Jonathan P. Beauchamp,et al.  Genome-wide association study identifies 74 loci associated with educational attainment , 2016, Nature.

[35]  Anders Albrechtsen,et al.  Weighting sequence variants based on their annotation increases power of whole-genome association studies , 2016, Nature Genetics.

[36]  A. Fanous,et al.  Meta-analysis of Positive and Negative Symptoms Reveals Schizophrenia Modifier Genes. , 2016, Schizophrenia bulletin.

[37]  Shripad Tuljapurkar,et al.  Limitations of GCTA as a solution to the missing heritability problem , 2015, Proceedings of the National Academy of Sciences.

[38]  Brian J. Eastwood,et al.  BrainSeq: Neurogenomics to Drive Novel Target Discovery for Neuropsychiatric Disorders , 2015, Neuron.

[39]  Anders M. Dale,et al.  An Empirical Bayes Mixture Model for Effect Size Distributions in Genome-Wide Association Studies , 2015, PLoS genetics.

[40]  O. Andreassen,et al.  Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics , 2015, bioRxiv.

[41]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[42]  J. Danesh,et al.  A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease , 2016 .

[43]  P. Visscher,et al.  Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index , 2015, Nature Genetics.

[44]  Frank Dudbridge,et al.  A Fast Method that Uses Polygenic Scores to Estimate the Variance Explained by Genome-wide Marker Panels and the Proportion of Variants Affecting a Trait. , 2015, American journal of human genetics.

[45]  N. Wray,et al.  Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance components analysis , 2015, Nature Genetics.

[46]  Ross M. Fraser,et al.  Genetic studies of body mass index yield new insights for obesity biology , 2015, Nature.

[47]  Thomas E. Nichols,et al.  Common genetic variants influence human subcortical brain structures , 2015, Nature.

[48]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[49]  S. Rosset,et al.  Measuring missing heritability: Inferring the contribution of common variants , 2014, Proceedings of the National Academy of Sciences.

[50]  J. Witte,et al.  The contribution of genetic variants to disease depends on the ruler , 2014, Nature Reviews Genetics.

[51]  Ross M. Fraser,et al.  Defining the role of common variation in the genomic and biological architecture of adult human height , 2014, Nature Genetics.

[52]  Eleazar Eskin,et al.  Identifying Causal Variants at Loci with Multiple Signals of Association , 2014, Genetics.

[53]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[54]  T. Vos,et al.  Global burden of disease attributable to mental and substance use disorders: findings from the Global Burden of Disease Study 2010 , 2013, The Lancet.

[55]  P. Crane,et al.  Alzheimer’s Disease: Analyzing the Missing Heritability , 2013, PloS one.

[56]  Nick C Fox,et al.  Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease , 2013, Nature Genetics.

[57]  Tanya M. Teslovich,et al.  Discovery and refinement of loci associated with lipid levels , 2013, Nature Genetics.

[58]  K. McCallum,et al.  Variation in the Heritability of Educational Attainment: An International Meta-Analysis , 2013 .

[59]  Jonathan P. Beauchamp,et al.  GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment , 2013, Science.

[60]  P. Lakatos,et al.  The burden of inflammatory bowel disease in Europe. , 2013, Journal of Crohn's & colitis.

[61]  O. Andreassen,et al.  All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs , 2013, PLoS genetics.

[62]  M. McCarthy,et al.  Improved detection of common variants associated with schizophrenia by leveraging pleiotropy with cardiovascular-disease risk factors. , 2013, American journal of human genetics.

[63]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[64]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[65]  Doug Speed,et al.  Improved heritability estimation from genome-wide SNPs. , 2012, American journal of human genetics.

[66]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[67]  Anders M. Dale,et al.  Rates of Decline in Alzheimer Disease Decrease with Age , 2012, PloS one.

[68]  P. Visscher,et al.  Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs , 2012, Nature Genetics.

[69]  Peter Kraft,et al.  Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis , 2012, Nature Genetics.

[70]  Stephan Ripke,et al.  Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs , 2012, Nature Genetics.

[71]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[72]  Hon-Cheong So,et al.  Uncovering the total heritability explained by all true susceptibility variants in a genome‐wide association study , 2011, Genetic epidemiology.

[73]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[74]  Mark I McCarthy,et al.  Genomic inflation factors under polygenic inheritance , 2011, European Journal of Human Genetics.

[75]  W. G. Hill,et al.  Genome partitioning of genetic variation for complex traits using common SNPs , 2011, Nature Genetics.

[76]  Chengqing Wu,et al.  A Comparison of Association Methods Correcting for Population Stratification in Case–Control Studies , 2011, Annals of human genetics.

[77]  P. Visscher,et al.  Estimating missing heritability for disease from genome-wide association studies. , 2011, American journal of human genetics.

[78]  R. Kessler,et al.  Prevalence and correlates of bipolar spectrum disorder in the world mental health survey initiative. , 2011, Archives of general psychiatry.

[79]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[80]  Christoph Lange,et al.  The Fundamentals of Modern Statistical Genetics , 2010 .

[81]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[82]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[83]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[84]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[85]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[86]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[87]  Angus W MacDonald,et al.  What we know: findings that every theory of schizophrenia should explain. , 2009, Schizophrenia bulletin.

[88]  D. J. Crowley,et al.  Relation of schizophrenia prevalence to latitude, climate, fish consumption, infant mortality, and skin color: a role for prenatal vitamin d deficiency and infections? , 2009, Schizophrenia bulletin.

[89]  Tyrone D. Cannon,et al.  Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study , 2009, The Lancet.

[90]  M. Daly,et al.  Estimation of the multiple testing burden for genomewide association studies of nearly all common variants , 2008, Genetic epidemiology.

[91]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[92]  K. Taylor,et al.  Genome-Wide Association , 2007, Diabetes.

[93]  K. Langa,et al.  Prevalence of Dementia in the United States: The Aging, Demographics, and Memory Study , 2007, Neuroepidemiology.

[94]  L. Fratiglioni,et al.  Role of genes and environments for explaining Alzheimer disease. , 2006, Archives of general psychiatry.

[95]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[96]  P. Sullivan,et al.  Schizophrenia as a complex trait: evidence from a meta-analysis of twin studies. , 2003, Archives of general psychiatry.

[97]  V. Peralta,et al.  How many and which are the psychopathological dimensions in schizophrenia? Issues influencing their ascertainment , 2001, Schizophrenia Research.

[98]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[99]  J. Barendregt,et al.  Global burden of disease , 1997, The Lancet.

[100]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[101]  D. Falconer The inheritance of liability to certain diseases, estimated from the incidence among relatives , 1965 .

[102]  E. Dempster,et al.  Heritability of Threshold Characters. , 1950, Genetics.

[103]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[104]  H. Johnson,et al.  A comparison of 'traditional' and multimedia information systems development practices , 2003, Inf. Softw. Technol..