Improving the coverage of credible sets in Bayesian genetic fine-mapping

Genome Wide Association Studies (GWAS) have successfully identified thousands of loci associated with human diseases. Bayesian genetic fine-mapping studies aim to identify the specific causal variants within GWAS loci responsible for each association, reporting credible sets of plausible causal variants, which are interpreted as containing the causal variant with some “coverage probability”. Here, we use simulations to demonstrate that the coverage probabilities are over-conservative in most fine-mapping situations. We show that this is because fine-mapping data sets are not randomly selected from amongst all causal variants, but from amongst causal variants with larger effect sizes. We present a method to re-estimate the coverage of credible sets using rapid simulations based on the observed, or estimated, SNP correlation structure, we call this the “adjusted coverage estimate”. This is extended to find “adjusted credible sets”, which are the smallest set of variants such that their adjusted coverage estimate meets the target coverage. We use our method to improve the resolution of a fine-mapping study of type 1 diabetes. We found that in 27 out of 39 associated genomic regions our method could reduce the number of potentially causal variants to consider for follow-up, and found that none of the 95% or 99% credible sets required the inclusion of more variants—a pattern matched in simulations of well powered GWAS. Crucially, our method requires only GWAS summary statistics and remains accurate when SNP correlations are estimated from a large reference panel. Using our method to improve the resolution of fine-mapping studies will enable more efficient expenditure of resources in the follow-up process of annotating the variants in the credible set to determine the implicated genes and pathways in human diseases.

[1]  N. Eriksson,et al.  Nature Genetics Advance Online Publication Meta-analysis of 375,000 Individuals Identifies 38 Susceptibility Loci for Migraine , 2022 .

[2]  Mary D Fortune,et al.  simGWAS: a fast method for simulation of large scale case–control GWAS summary statistics , 2018, bioRxiv.

[3]  Matthew Stephens,et al.  A simple new approach to variable selection in regression, with application to genetic fine-mapping , 2018, bioRxiv.

[4]  Jake K. Byrnes,et al.  Bayesian refinement of association signals for 14 loci in 3 common diseases , 2012, Nature Genetics.

[5]  Jon Wakefield,et al.  Bayes factors for genome‐wide association studies: comparison with P‐values , 2009, Genetic epidemiology.

[6]  Linda S. Wicker,et al.  Stochastic search and joint fine-mapping increases accuracy and identifies previously unreported associations in immune-mediated diseases , 2019, Nature Communications.

[7]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[8]  Tanya M. Teslovich,et al.  Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility , 2014, Nature Genetics.

[9]  Alicia R. Martin,et al.  Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder , 2018, Nature Genetics.

[10]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[11]  Hailiang Huang,et al.  Fine-mapping inflammatory bowel disease loci to single variant resolution , 2017, Nature.

[12]  Joseph K. Pickrell,et al.  Approximately independent linkage disequilibrium blocks in human populations , 2015, bioRxiv.

[13]  Kenneth G. C. Smith,et al.  Resolving mechanisms of immune‐mediated disease in primary CD4 T cells , 2020, bioRxiv.

[14]  John M. Greene,et al.  Locating three-dimensional roots by a bisection method , 1992 .

[15]  Eleazar Eskin,et al.  Identifying Causal Variants at Loci with Multiple Signals of Association , 2014, Genetics.

[16]  F. Collins,et al.  The geneticist's approach to complex disease. , 1996, Annual review of medicine.

[17]  Sylvia Richardson,et al.  Dissection of a Complex Disease Susceptibility Region Using a Bayesian Stochastic Search Approach to Fine Mapping , 2015, bioRxiv.

[18]  Manuel A. R. Ferreira,et al.  Multi-ethnic genome-wide association study of 21,000 cases and 95,000 controls identifies new risk loci for atopic dermatitis , 2015, Nature Genetics.

[19]  Gary D Bader,et al.  Association analysis identifies 65 new breast cancer risk loci , 2017, Nature.

[20]  J. Barrett,et al.  Strategies for fine-mapping complex traits , 2015, Human molecular genetics.

[21]  Xiaoquan Wen,et al.  Efficient Integrative Multi-SNP Association Analysis using Deterministic Approximation of Posteriors , 2015, bioRxiv.

[22]  Manolis Kellis,et al.  Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers , 2015, Nature Genetics.

[23]  Christian Gieger,et al.  Genetic fine-mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci , 2016 .

[24]  Martin Vingron,et al.  A trans-acting locus regulates an anti-viral expression network and type 1 diabetes risk , 2010, Nature.

[25]  Mark I. McCarthy,et al.  Evaluating the Performance of Fine-Mapping Strategies at Common Variant GWAS Loci , 2015, PLoS genetics.

[26]  William Valdar,et al.  Reprioritizing Genetic Associations in Hit Regions Using LASSO-Based Resample Model Averaging , 2012, Genetic epidemiology.

[27]  Matti Pirinen,et al.  FINEMAP: efficient variable selection using summary data from genome-wide association studies , 2015, bioRxiv.

[28]  Xiaoquan Wen,et al.  Bayesian Multi-SNP Genetic Association Analysis: Control of FDR and Use of Summary Statistics , 2018, bioRxiv.

[29]  Jing He,et al.  Trans-ancestry genome-wide association study identifies 12 genetic loci influencing blood pressure and implicates a role for DNA methylation , 2015, Nature Genetics.

[30]  A. Price,et al.  Dissecting the genetics of complex traits using summary association statistics , 2016, Nature Reviews Genetics.

[31]  H. Cordell,et al.  SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression , 2010, Genetic epidemiology.

[32]  Yara T. E. Lechanteur,et al.  Nature Genetics Advance Online Publication , 2022 .

[33]  A. Morris,et al.  Transethnic Meta-Analysis of Genomewide Association Studies , 2011, Genetic epidemiology.

[34]  D. Schaid,et al.  From genome-wide associations to candidate causal variants by statistical fine-mapping , 2018, Nature Reviews Genetics.

[35]  Tom R. Gaunt,et al.  The UK10K project identifies rare variants in health and disease , 2016 .

[36]  Sylvia Richardson,et al.  JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects , 2016, Genetic epidemiology.