Identifying Causal Variants at Loci with Multiple Signals of Association

Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of association signal in an iterative conditioning framework, or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus which is typically invalid at many risk loci. In this work, we propose a new statistical frameworks that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g. 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20-50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from a eQTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus.

[1]  Doug Speed,et al.  Improved heritability estimation from genome-wide SNPs. , 2012, American journal of human genetics.

[2]  Eran Halperin,et al.  Leveraging genetic variability across populations for the identification of causal variants. , 2010, American journal of human genetics.

[3]  Simon C. Potter,et al.  Genome-wide Association Analysis Identifies 14 New Risk Loci for Schizophrenia , 2013, Nature Genetics.

[4]  William Stafford Noble,et al.  Integrative annotation of chromatin elements from ENCODE data , 2012, Nucleic acids research.

[5]  Joseph T. Glessner,et al.  A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene , 2007, Nature.

[6]  Eleazar Eskin,et al.  Increasing Power of Groupwise Association Test with Likelihood Ratio Test , 2011, RECOMB.

[7]  M. McCarthy,et al.  Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes , 2007, Science.

[8]  A. Dunning,et al.  Beyond GWASs: illuminating the dark road from association to function. , 2013, American journal of human genetics.

[9]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[10]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[11]  Peter Kraft,et al.  Fine mapping of a region of chromosome 11q13 reveals multiple independent loci associated with risk of prostate cancer. , 2011, Human molecular genetics.

[12]  Carol Moreno,et al.  Identifying multiple causative genes at a single GWAS locus , 2013, Genome research.

[13]  W. G. Hill,et al.  Genome partitioning of genetic variation for complex traits using common SNPs , 2011, Nature Genetics.

[14]  M. Pirinen,et al.  Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis , 2013, Nature Genetics.

[15]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[16]  Bjarni J. Vilhjálmsson,et al.  An efficient multi-locus mixed model approach for genome-wide association studies in structured populations , 2012, Nature Genetics.

[17]  P. Visscher,et al.  Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits , 2012, Nature Genetics.

[18]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[19]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[20]  Matthew C. Canver,et al.  An Erythroid Enhancer of BCL11A Subject to Genetic Variation Determines Fetal Hemoglobin Level , 2013, Science.

[21]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[22]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[23]  Stephan Ripke,et al.  Rare, low-frequency, and common variants in the protein-coding sequence of biological candidate genes from GWASs contribute to risk of rheumatoid arthritis. , 2013, American journal of human genetics.

[24]  Giske Ursin,et al.  FGFR2 variants and breast cancer risk: fine-scale mapping using African American studies and analysis of chromatin conformation. , 2009, Human molecular genetics.

[25]  A. Whittemore,et al.  Multiple regions within 8q24 independently affect risk for prostate cancer , 2007, Nature Genetics.

[26]  Sarah Edkins,et al.  Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease , 2011, Nature Genetics.

[27]  Jake K. Byrnes,et al.  Bayesian refinement of association signals for 14 loci in 3 common diseases , 2012, Nature Genetics.

[28]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[29]  Tin Aung,et al.  Genome-wide association analyses identify multiple loci associated with central corneal thickness and keratoconus , 2013, Nature Genetics.

[30]  Eric Farber-Eger,et al.  Fine Mapping and Identification of BMI Loci in African Americans. , 2013, American journal of human genetics.

[31]  Gary K. Chen,et al.  Enriching the analysis of genomewide association studies with hierarchical modeling. , 2007, American journal of human genetics.

[32]  Peter Kraft,et al.  Re-Ranking Sequencing Variants in the Post-GWAS Era for Accurate Causal Variant Identification , 2013, PLoS genetics.

[33]  Joshua T. Burdick,et al.  Mapping determinants of human gene expression by regional and genome-wide association , 2005, Nature.

[34]  Arcadi Navarro,et al.  High Trans-ethnic Replicability of GWAS Results Implies Common Causal Variants , 2013, PLoS genetics.

[35]  Eleazar Eskin,et al.  Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers , 2009, PLoS genetics.

[36]  Jiannis Ragoussis,et al.  Genetically indistinguishable SNPs and their influence on inferring the location of disease-associated variants. , 2005, Genome research.

[37]  Jennifer G. Robinson,et al.  Trans-Ethnic Fine-Mapping of Lipid Loci Identifies Population-Specific Signals and Allelic Heterogeneity That Increases the Trait Variance Explained , 2013, PLoS genetics.

[38]  Manuel A. R. Ferreira,et al.  Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4 , 2011, Nature Genetics.

[39]  Eran Halperin,et al.  Rare Variant Association Testing Under Low-Coverage Sequencing , 2013, Genetics.

[40]  Pardis C Sabeti,et al.  Linkage disequilibrium in the human genome , 2001, Nature.

[41]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[42]  N. Schork,et al.  Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. , 2008, American journal of human genetics.

[43]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[44]  Qianqian Zhu,et al.  Leveraging Prior Information to Detect Causal Variants via Multi-Variant Regression , 2013, PLoS Comput. Biol..

[45]  Eleazar Eskin,et al.  Improved linear mixed models for genome-wide association studies , 2012, Nature Methods.

[46]  Eleazar Eskin,et al.  Incorporating prior information into association studies , 2012, Bioinform..

[47]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[48]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[49]  Eleazar Eskin,et al.  Improving the accuracy and efficiency of partitioning heritability into the contributions of genomic regions. , 2013, American journal of human genetics.

[50]  Fabian J Theis,et al.  Genome-wide association analyses identify 18 new loci associated with serum urate concentrations , 2012, Nature Genetics.

[51]  Francis S Collins,et al.  A HapMap harvest of insights into the genetics of common disease. , 2008, The Journal of clinical investigation.

[52]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[53]  Eleazar Eskin Increasing power in association studies by using linkage disequilibrium structure and molecular function as prior information. , 2008, Genome research.

[54]  Xiaofeng Zhu,et al.  ARTICLE Genome-wide Characterization of Shared and Distinct Genetic Components that Influence Blood Lipid Levels in Ethnically Diverse Human Populations , 2022 .

[55]  Peter Kraft,et al.  Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis , 2012, Nature Genetics.

[56]  Joel N Hirschhorn,et al.  Fine-mapping at three loci known to affect fetal hemoglobin levels explains additional genetic variation , 2010, Nature Genetics.

[57]  M. Daly,et al.  Genetic Mapping in Human Disease , 2008, Science.

[58]  Dana C. Crawford,et al.  A Systematic Mapping Approach of 16q12.2/FTO and BMI in More Than 20,000 African Americans Narrows in on the Underlying Functional Variation: Results from the Population Architecture using Genomics and Epidemiology (PAGE) Study , 2013, PLoS genetics.