TIGAR: An Improved Bayesian Tool for Transcriptomic Data Imputation Enhances Gene Mapping of Complex Traits

The transcriptome-wide association studies (TWAS) that test for association between the study trait and the imputed gene expression levels from cis-acting expression quantitative trait loci (cis-eQTL) genotypes have successfully enhanced the discovery of genetic risk loci for complex traits. By using the gene expression imputation models fitted from reference datasets that have both genetic and transcriptomic data, TWAS facilitates gene-based tests with GWAS data while accounting for the reference transcriptomic data. The existing TWAS tools like PrediXcan and FUSION use parametric imputation models that have limitations for modeling the complex genetic architecture of transcriptomic data. Therefore, to improve on this, we propose to use a Bayesian method that assumes a data-driven nonparametric prior to impute gene expression. The nonparametric Bayesian method is flexible and general because it includes both of the parametric imputation models used by PrediXcan and FUSION as special cases. Our simulation studies showed that the nonparametric Bayesian model improved both imputation R2 for transcriptomic data and the TWAS power over PrediXcan. In real applications, our nonparametric Bayesian method fitted transcriptomic imputation models for 57.6% more genes over PrediXcan, thus improving the power of follow-up TWAS. Hence, the nonparametric Bayesian model is preferred for modeling the complex genetic architecture of transcriptomes and is expected to enhance transcriptome-integrated genetic association studies. We implement our Bayesian approach in a convenient software tool “TIGAR” (Transcriptome-Integrated Genetic Association Resource), which imputes transcriptomic data and performs subsequent TWAS using individual-level or summary-level GWAS data.

[1]  Qing Li,et al.  The Bayesian elastic net , 2010 .

[2]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[3]  Nick C Fox,et al.  Rare coding variants in PLCG2, ABI3, and TREM2 implicate microglial-mediated innate immunity in Alzheimer's disease , 2017, Nature Genetics.

[4]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[5]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[6]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[7]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[8]  L. Tan,et al.  The Role of ADAM10 in Alzheimer's Disease. , 2017, Journal of Alzheimer's disease : JAD.

[9]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[10]  Hongyu Zhao,et al.  A statistical framework for cross-tissue transcriptome-wide association analysis , 2018, Nature Genetics.

[11]  M. Daly,et al.  Genetic and Epigenetic Fine-Mapping of Causal Autoimmune Disease Variants , 2014, Nature.

[12]  Andrew D. Johnson,et al.  A systematic heritability analysis of the human whole blood transcriptome , 2015, Human Genetics.

[13]  Luigi Ferrucci,et al.  Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain , 2010, PLoS genetics.

[14]  Xinyuan Dong,et al.  A Mixed-Effects Model for Powerful Association Tests in Integrative Functional Genomics. , 2018, American journal of human genetics.

[15]  James T. Elder,et al.  Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants , 2016, Nature Communications.

[16]  Luke R. Lloyd-Jones,et al.  The Genetic Architecture of Gene Expression in Peripheral Blood. , 2017, American journal of human genetics.

[17]  J. Schneider,et al.  Overview and findings from the religious orders study. , 2012, Current Alzheimer research.

[18]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[19]  P. O’Reilly,et al.  MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS , 2012, PloS one.

[20]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[21]  Alexander Gusev,et al.  Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. , 2017, American journal of human genetics.

[22]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[23]  D. Koller,et al.  Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals , 2013, Genome research.

[24]  Ellis Patrick,et al.  An xQTL map integrates the genetic architecture of the human brain’s transcriptome and epigenome , 2017, Nature Neuroscience.

[25]  G. Casella Empirical Bayes Gibbs sampling. , 2001, Biostatistics.

[26]  Xiang Zhou,et al.  A scalable Bayesian method for integrating functional information in genome-wide association studies , 2017, bioRxiv.

[27]  M. Stephens,et al.  Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies , 2012 .

[28]  Riten Mitra,et al.  Bayesian Nonparametric Inference - Why and How. , 2013, Bayesian analysis.

[29]  David A Bennett,et al.  Religious Orders Study and Rush Memory and Aging Project. , 2018, Journal of Alzheimer's disease : JAD.

[30]  Benjamin M. Neale,et al.  Human demographic history impacts genetic risk prediction across diverse populations , 2016, bioRxiv.

[31]  Nick C Fox,et al.  Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease , 2013, Nature Genetics.

[32]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[33]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[34]  T. Lehtimäki,et al.  Integrative approaches for large-scale transcriptome-wide association studies , 2015, Nature Genetics.

[35]  Jason J. Corneveaux,et al.  A genome-wide scan for common variants affecting the rate of age-related cognitive decline , 2012, Neurobiology of Aging.

[36]  C. Reitz Novel susceptibility loci for Alzheimer's disease. , 2015, Future neurology.

[37]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[38]  Manolis Kellis,et al.  Alzheimer's disease: early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci , 2014 .

[39]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[40]  Xiang Zhou,et al.  Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models , 2017, Nature Communications.

[41]  Jeffery M. Meyer,et al.  A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer , 2018, Nature Genetics.

[42]  Manolis Kellis,et al.  Alzheimery's disease pathology is associated with early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci , 2014, Nature Neuroscience.

[43]  Genetic loci associated with Alzheimer's disease. , 2014, Future neurology.

[44]  Qingyang Huang,et al.  Genetic study of complex diseases in the post-GWAS era. , 2015, Journal of genetics and genomics = Yi chuan xue bao.

[45]  P. Visscher,et al.  Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets , 2016, Nature Genetics.

[46]  J. Schneider,et al.  Overview and findings from the rush Memory and Aging Project. , 2012, Current Alzheimer research.

[47]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[48]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[49]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[50]  Jing Zhao,et al.  The Genetic Architecture of Gene Expression in Peripheral Blood. , 2017, American journal of human genetics.

[51]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.