Dissection of a Complex Disease Susceptibility Region Using a Bayesian Stochastic Search Approach to Fine Mapping

Identification of candidate causal variants in regions associated with risk of common diseases is complicated by linkage disequilibrium (LD) and multiple association signals. Nonetheless, accurate maps of these variants are needed, both to fully exploit detailed cell specific chromatin annotation data to highlight disease causal mechanisms and cells, and for design of the functional studies that will ultimately be required to confirm causal mechanisms. We adapted a Bayesian evolutionary stochastic search algorithm to the fine mapping problem, and demonstrated its improved performance over conventional stepwise and regularised regression through simulation studies. We then applied it to fine map the established multiple sclerosis (MS) and type 1 diabetes (T1D) associations in the IL-2RA (CD25) gene region. For T1D, both stepwise and stochastic search approaches identified four T1D association signals, with the major effect tagged by the single nucleotide polymorphism, rs12722496. In contrast, for MS, the stochastic search found two distinct competing models: a single candidate causal variant, tagged by rs2104286 and reported previously using stepwise analysis; and a more complex model with two association signals, one of which was tagged by the major T1D associated rs12722496 and the other by rs56382813. There is low to moderate LD between rs2104286 and both rs12722496 and rs56382813 (r2 ≃ 0:3) and our two SNP model could not be recovered through a forward stepwise search after conditioning on rs2104286. Both signals in the two variant model for MS affect CD25 expression on distinct subpopulations of CD4+ T cells, which are key cells in the autoimmune process. The results support a shared causal variant for T1D and MS. Our study illustrates the benefit of using a purposely designed model search strategy for fine mapping and the advantage of combining disease and protein expression data.

[1]  Sampath Prahalad,et al.  Dense genotyping of immune-related disease regions identifies 14 new susceptibility loci for juvenile idiopathic arthritis , 2013, Nature Genetics.

[2]  C. Wallace,et al.  Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics , 2013, PLoS genetics.

[3]  J. Todd,et al.  Reduced Expression of IFIH1 Is Protective for Type 1 Diabetes , 2010, PloS one.

[4]  M. Pirinen,et al.  Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis , 2013, Nature Genetics.

[5]  J. Danesh,et al.  GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm , 2013, PLoS genetics.

[6]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[7]  Sylvia Richardson,et al.  Evolutionary Stochastic Search for Bayesian model exploration , 2010, 1002.2706.

[8]  Mark Atkinson,et al.  Large-scale genetic fine mapping and genotype-phenotype associations implicate polymorphism in the IL2RA region in type 1 diabetes , 2007, Nature Genetics.

[9]  Xiayi Ke,et al.  Rheumatoid arthritis susceptibility loci at chromosomes 10p15, 12q13 and 22q13 , 2008, Nature Genetics.

[10]  Joseph K. Pickrell Joint analysis of functional genomic data and genome-wide association studies of 18 human traits , 2013, bioRxiv.

[11]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[12]  Chun Jimmie Ye,et al.  Intersection of population variation and autoimmunity genetics in human T cell activation , 2014, Science.

[13]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[14]  Nicola K. Wilson,et al.  Long-range DNA looping and gene expression analyses identify DEXI as an autoimmune disease candidate gene , 2011, Human molecular genetics.

[15]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[16]  Alan J. Miller Sélection of subsets of regression variables , 1984 .

[17]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[18]  C. Weyand,et al.  Phosphofructokinase deficiency impairs ATP generation, autophagy, and redox balance in rheumatoid arthritis T cells , 2013, The Journal of experimental medicine.

[19]  R. Lewontin The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. , 1964, Genetics.

[20]  Andrew P Morris,et al.  A flexible Bayesian framework for modeling haplotype association with disease, allowing for dominance effects of the underlying causative variants. , 2006, American journal of human genetics.

[21]  Elizabeth Whalen,et al.  Multiple Autoimmune-Associated Variants Confer Decreased IL-2R Signaling in CD4+CD25hi T Cells of Type 1 Diabetic and Multiple Sclerosis Patients , 2013, PloS one.

[22]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[23]  H. Cordell,et al.  SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression , 2010, Genetic epidemiology.

[24]  Manolis Kellis,et al.  Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers , 2015, Nature Genetics.

[25]  C. Cannings,et al.  Single nucleotide polymorphisms in the human interleukin-1B gene affect transcription according to haplotype context. , 2006, Human molecular genetics.

[26]  Wieslawa I. Mentzen,et al.  Genetic Variants Regulating Immune Cell Levels in Health and Disease , 2013, Cell.

[27]  Sylvia Richardson,et al.  Bayesian Detection of Expression Quantitative Trait Loci Hot Spots , 2011, Genetics.

[28]  Eli Stahl,et al.  High density genetic mapping identifies new susceptibility loci for rheumatoid arthritis , 2012, Nature Genetics.

[29]  L. Grøntved,et al.  eRNAs promote transcription by establishing chromatin accessibility at defined genomic loci. , 2013, Molecular cell.

[30]  Jun S. Liu,et al.  Genetics of rheumatoid arthritis contributes to biology and drug discovery , 2013 .

[31]  Linda S. Wicker,et al.  Type 1 Diabetes-Associated IL2RA Variation Lowers IL-2 Signaling and Contributes to Diminished CD4+CD25+ Regulatory T Cell Function , 2012, The Journal of Immunology.

[32]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[33]  M. Stephens,et al.  Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits , 2007, PLoS genetics.

[34]  S. Khan,et al.  A role for PFK-2/FBPase-2, as distinct from fructose 2,6-bisphosphate, in regulation of insulin secretion in pancreatic beta-cells. , 2008, The Biochemical journal.

[35]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[36]  Jake K. Byrnes,et al.  Bayesian refinement of association signals for 14 loci in 3 common diseases , 2012, Nature Genetics.

[37]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[38]  M. Stephens,et al.  Integrated Enrichment Analysis of Variants and Pathways in Genome-Wide Association Studies Indicates Central Role for IL-2 Signaling Genes in Type 1 Diabetes, and Cytokine Signaling Genes in Crohn's Disease , 2013, PLoS genetics.

[39]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[40]  Joseph E. Powell,et al.  Detection and replication of epistasis influencing transcription in humans , 2014, Nature.

[41]  R. Little,et al.  A comparison of doubly robust estimators of the mean with missing data , 2015 .

[42]  Calliope A. Dendrou,et al.  Postthymic Expansion in Human CD4 Naive T Cells Defined by Expression of Functional High-Affinity IL-2 Receptors , 2013, The Journal of Immunology.

[43]  David E. Anderson,et al.  IL2RA Genetic Heterogeneity in Multiple Sclerosis and Type 1 Diabetes Susceptibility and Soluble Interleukin-2 Receptor Production , 2009, PLoS genetics.

[44]  P. Visscher,et al.  Another Explanation for Apparent Epistasis , 2014 .

[45]  Yi Yang,et al.  A fast unified algorithm for solving group-lasso penalize learning problems , 2014, Statistics and Computing.

[46]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[47]  C. Glass,et al.  Epigenomics: Roadmap for regulation , 2015, Nature.

[48]  C. Hoggart,et al.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies , 2008, PLoS genetics.

[49]  Matthew Hardy,et al.  Cell-specific protein phenotypes for the autoimmune locus IL2RA using a genotype-selectable human bioresource , 2009, Nature Genetics.

[50]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[51]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[52]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[53]  Sarah Edkins,et al.  Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease , 2011, Nature Genetics.

[54]  L. Wicker,et al.  The IL-2/CD25 Pathway Determines Susceptibility to T1D in Humans and NOD Mice , 2008, Journal of Clinical Immunology.

[55]  Adrian Vella,et al.  Localization of a type 1 diabetes locus in the IL2RA/CD25 region by use of tag single-nucleotide polymorphisms. , 2005, American journal of human genetics.

[56]  T. Malek,et al.  Interleukin-2 receptor signaling: at the interface between tolerance and immunity. , 2010, Immunity.

[57]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[58]  G. Wahba,et al.  A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION , 2006 .

[59]  S. Gabriel,et al.  Risk alleles for multiple sclerosis identified by a genomewide study. , 2007, The New England journal of medicine.

[60]  Marc Chadeau-Hyam,et al.  R2GUESS: A Graphics Processing Unit-Based R Package for Bayesian Variable Selection Regression of Multivariate Responses. , 2016, Journal of statistical software.

[61]  M. McCarthy,et al.  Genome-wide association studies: potential next steps on a genetic journey. , 2008, Human molecular genetics.