Pleiotropic Mapping and Annotation Selection in Genome-wide Association Studies with Penalized Gaussian Mixture Models

Motivation Genome-wide association studies (GWASs) have identified many genetic loci associated with complex traits. A substantial fraction of these identified loci is associated with multiple traits-a phenomena known as pleiotropy. Identification of pleiotropic associations can help characterize the genetic relationship among complex traits and can facilitate our understanding of disease etiology. Effective pleiotropic association mapping requires the development of statistical methods that can jointly model multiple traits with genome-wide single nucleic polymorphisms (SNPs) together. Results We develop a joint modeling method, which we refer to as the integrative MApping of Pleiotropic association (iMAP). iMAP models summary statistics from GWASs, uses a multivariate Gaussian distribution to account for phenotypic correlation, simultaneously infers genome-wide SNP association pattern using mixture modeling and has the potential to reveal causal relationship between traits. Importantly, iMAP integrates a large number of SNP functional annotations to substantially improve association mapping power, and, with a sparsity-inducing penalty, is capable of selecting informative annotations from a large, potentially non-informative set. To enable scalable inference of iMAP to association studies with hundreds of thousands of individuals and millions of SNPs, we develop an efficient expectation maximization algorithm based on an approximate penalized regression algorithm. With simulations and comparisons to existing methods, we illustrate the benefits of iMAP in terms of both high association mapping power and accurate estimation of genome-wide SNP association patterns. Finally, we apply iMAP to perform a joint analysis of 48 traits from 31 GWAS consortia together with 40 tissue-specific SNP annotations generated from the Roadmap Project. Availability and implementation iMAP is freely available at http://www.xzlab.org/software.html. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Joseph K. Pickrell Joint analysis of functional genomic data and genome-wide association studies of 18 human traits , 2013, bioRxiv.

[2]  Joseph K. Pickrell,et al.  Detection and interpretation of shared genetic influences on 42 human traits , 2015, Nature Genetics.

[3]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[4]  Xiaoquan Wen,et al.  Cross-Population Joint Analysis of eQTLs: Fine Mapping and Functional Annotation , 2014, bioRxiv.

[5]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[6]  R. Collins,et al.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease , 2008, Nature Genetics.

[7]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[8]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[9]  Marcel J. T. Reinders,et al.  Predicting functional effect of human missense mutations , 2013 .

[10]  Alireza S. Mahani,et al.  Fast Estimation of Multinomial Logit Models: R Package mnlogit , 2014, 1404.3177.

[11]  M. Daly,et al.  Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis , 2013, The Lancet.

[12]  Jing Liang,et al.  Chromatin architecture reorganization during stem cell differentiation , 2015, Nature.

[13]  Fabrice Larribe,et al.  ON COMPOSITE LIKELIHOODS IN STATISTICAL GENETICS , 2011 .

[14]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[15]  Laura J. Scott,et al.  Joint Analysis of Psychiatric Disorders Increases Accuracy of Risk Prediction for Schizophrenia, Bipolar Disorder, and Major Depressive Disorder , 2015, American journal of human genetics.

[16]  Jianxin Shi,et al.  Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs , 2013, Nature Genetics.

[17]  Matthew Stephens,et al.  The genetic architecture of gene expression levels in wild baboons , 2014, bioRxiv.

[18]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[19]  Hongyu Zhao,et al.  Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies , 2015, bioRxiv.

[20]  Christine B. Peterson,et al.  Controlling the Rate of GWAS False Discoveries , 2016, Genetics.

[21]  Jelle J. Goeman,et al.  Multiple hypothesis testing in genomics , 2014, Statistics in medicine.

[22]  J. Barrett,et al.  Strategies for fine-mapping complex traits , 2015, Human molecular genetics.

[23]  Xiaoquan Wen,et al.  Efficient Integrative Multi-SNP Association Analysis using Deterministic Approximation of Posteriors , 2015, bioRxiv.

[24]  E. Eskin,et al.  Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies , 2014, PLoS genetics.

[25]  J. Todd,et al.  A method for identifying genetic heterogeneity within phenotypically-defined disease subgroups , 2016, Nature Genetics.

[26]  Doug Speed,et al.  Improved heritability estimation from genome-wide SNPs. , 2012, American journal of human genetics.

[27]  Kyle J. Gaulton,et al.  Multiple Hepatic Regulatory Variants at the GALNT2 GWAS Locus Associated with High-Density Lipoprotein Cholesterol. , 2015, American journal of human genetics.

[28]  Xiaofeng Zhu,et al.  Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. , 2015, American journal of human genetics.

[29]  C. Wallace,et al.  Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics , 2013, PLoS genetics.

[30]  Mariza de Andrade,et al.  Genome-wide association study of primary sclerosing cholangitis identifies new risk loci and quantifies the genetic relationship with inflammatory bowel disease , 2016, Nature Genetics.

[31]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[32]  Jonathan K. Pritchard,et al.  Identification of Genetic Variants That Affect Histone Modifications in Human Cells , 2013, Science.

[33]  Qianchuan He,et al.  A General Framework for Association Tests With Multivariate Traits in Large‐Scale Genomics Studies , 2013, Genetic epidemiology.

[34]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[35]  B. Pasaniuc,et al.  Leveraging Functional-Annotation Data in Trans-ethnic Fine-Mapping Studies. , 2015, American journal of human genetics.

[36]  F. Agakov,et al.  Abundant pleiotropy in human complex diseases and traits. , 2011, American journal of human genetics.

[37]  A. Zhernakova,et al.  Detecting shared pathogenesis from the shared genetics of immune-related diseases , 2009, Nature Reviews Genetics.

[38]  David C. Wilson,et al.  Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease , 2012, Nature.

[39]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[40]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[41]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[42]  M J Wright,et al.  Partitioning heritability analysis reveals a shared genetic basis of brain anatomy and schizophrenia , 2016, Molecular Psychiatry.

[43]  M. Stephens,et al.  Integrated Enrichment Analysis of Variants and Pathways in Genome-Wide Association Studies Indicates Central Role for IL-2 Signaling Genes in Type 1 Diabetes, and Cytokine Signaling Genes in Crohn's Disease , 2013, PLoS genetics.

[44]  Denis C. Bauer,et al.  Genetic correlation between amyotrophic lateral sclerosis and schizophrenia , 2017, Nature Communications.

[45]  A. Gylfason,et al.  A rare splice donor mutation in the haptoglobin gene associates with blood lipid levels and coronary artery disease , 2017, Human molecular genetics.

[46]  Keith A. Boroevich,et al.  1 Supplementary Material : Empirical Bayes estimation of semi-parametric hierarchical mixture models for unbiased characterization of polygenic disease architectures , 2022 .

[47]  M. Daly,et al.  An Atlas of Genetic Correlations across Human Diseases and Traits , 2015, Nature Genetics.

[48]  Kasper Lage,et al.  Pervasive Sharing of Genetic Effects in Autoimmune Disease , 2011, PLoS genetics.

[49]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[50]  C. Sabatti,et al.  Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort , 2015, Genetics.

[51]  Hongyu Zhao,et al.  GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation , 2014, PLoS genetics.

[52]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[53]  M. Stephens A Unified Framework for Association Analysis with Multiple Related Phenotypes , 2013, PloS one.

[54]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[55]  Jiang Li,et al.  MGAS: a powerful tool for multivariate gene-based genome-wide association analysis , 2014, Bioinform..

[56]  Taesung Park,et al.  Large-scale genome-wide association studies in east Asians identify new genetic loci influencing metabolic traits , 2011, Nature Genetics.

[57]  Olivia Fletcher,et al.  Architecture of inherited susceptibility to common cancer , 2010, Nature Reviews Cancer.

[58]  J. Wall,et al.  Haplotype blocks and linkage disequilibrium in the human genome , 2003, Nature Reviews Genetics.

[59]  He Gao,et al.  Genome-wide association analysis identifies novel blood pressure loci and offers biological insights into cardiovascular risk , 2017, Nature Genetics.

[60]  Constantin Polychronakos,et al.  A Genome-Wide Meta-Analysis of Six Type 1 Diabetes Cohorts Identifies Multiple Associated Loci , 2011, PLoS genetics.

[61]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[62]  Daniel J Schaid,et al.  Incorporating Functional Annotations for Fine-Mapping Causal Variants in a Bayesian Framework Using Summary Statistics , 2016, Genetics.

[63]  N. Dracopoli,et al.  Current protocols in human genetics , 1994 .

[64]  N. Lane,et al.  To Wnt or not to Wnt: the bone and joint health dilemma , 2013, Nature Reviews Rheumatology.

[65]  O. Andreassen,et al.  All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs , 2013, PLoS genetics.

[66]  Anne Tybjærg-Hansen,et al.  Exome-wide association study identifies a TM6SF2 variant that confers susceptibility to nonalcoholic fatty liver disease , 2014, Nature Genetics.

[67]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[68]  Manolis Kellis,et al.  Discovery and Characterization of Chromatin States for Systematic Annotation of the Human Genome , 2011, RECOMB.

[69]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[70]  Anders M. Dale,et al.  Identifying Common Genetic Variants in Blood Pressure Due to Polygenic Pleiotropy With Associated Phenotypes , 2014, Hypertension.

[71]  M. Stephens,et al.  A Statistical Framework for Joint eQTL Analysis in Multiple Tissues , 2012, PLoS genetics.

[72]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[73]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[74]  Dan Geiger,et al.  Multikernel linear mixed models for complex phenotype prediction , 2016, Genome research.

[75]  Yang Zhao,et al.  Variable selection approach for zero-inflated count data via adaptive lasso , 2014 .

[76]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[77]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[78]  Wei Liu,et al.  Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction , 2017, PLoS genetics.

[79]  S. Purcell,et al.  Pleiotropy in complex traits: challenges and strategies , 2013, Nature Reviews Genetics.

[80]  Judy H. Cho,et al.  Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations , 2015, Nature Genetics.

[81]  Han Xu,et al.  Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. , 2014, American journal of human genetics.

[82]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[83]  Morgan C. Giddings,et al.  Defining functional DNA elements in the human genome , 2014, Proceedings of the National Academy of Sciences.

[84]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[85]  Jin Liu,et al.  EPS: an empirical Bayes approach to integrating pleiotropy and tissue-specific information for prioritizing risk genes , 2016, Bioinform..

[86]  Xiaofeng Zhu,et al.  Genome-wide association analyses of sleep disturbance traits identify new loci and highlight shared genetics with neuropsychiatric and metabolic traits , 2016, Nature Genetics.

[87]  Doug Speed,et al.  MultiBLUP: improved SNP-based prediction for complex traits , 2014, Genome research.

[88]  Zhaohui S. Qin,et al.  DIVAN: accurate identification of non-coding disease-specific risk variants using multi-omics profiles , 2016, Genome Biology.

[89]  Manolis Kellis,et al.  Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases , 2016, Nucleic acids research.

[90]  P. Visscher,et al.  Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model , 2015, PLoS genetics.

[91]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[92]  R. McPherson,et al.  TRIB1 Is Regulated Post-Transcriptionally by Proteasomal and Non-Proteasomal Pathways , 2016, PloS one.

[93]  Wei Chen,et al.  Longitudinal Genome-Wide Association of Cardiovascular Disease Risk Factors in the Bogalusa Heart Study , 2010, PLoS genetics.

[94]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[95]  F. Rivadeneira,et al.  Osteoporosis and Bone Mass Disorders: From Gene Pathways to Treatments , 2016, Trends in Endocrinology & Metabolism.

[96]  Timothy J. Durham,et al.  Systematic analysis of chromatin state dynamics in nine human cell types , 2011, Nature.

[97]  Conor V. Dolan,et al.  TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies , 2013, PLoS genetics.

[98]  Kevin Y. Yip,et al.  FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer , 2014, Genome Biology.