Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues

Understanding the genetic architecture of gene expression traits is key to elucidating the underlying mechanisms of complex traits. Here, for the first time, we perform a systematic survey of the heritability and the distribution of effect sizes across all representative tissues in the human body. We find that local h2 can be relatively well characterized with 59% of expressed genes showing significant h2 (FDR < 0.1) in the DGN whole blood cohort. However, current sample sizes (n ≤ 922) do not allow us to compute distal h2. Bayesian Sparse Linear Mixed Model (BSLMM) analysis provides strong evidence that the genetic contribution to local expression traits is dominated by a handful of genetic variants rather than by the collective contribution of a large number of variants each of modest size. In other words, the local architecture of gene expression traits is sparse rather than polygenic across all 40 tissues (from DGN and GTEx) examined. This result is confirmed by the sparsity of optimal performing gene expression predictors via elastic net modeling. To further explore the tissue context specificity, we decompose the expression traits into cross-tissue and tissue-specific components using a novel Orthogonal Tissue Decomposition (OTD) approach. Through a series of simulations we show that the cross-tissue and tissue-specific components are identifiable via OTD. Heritability and sparsity estimates of these derived expression phenotypes show similar characteristics to the original traits. Consistent properties relative to prior GTEx multi-tissue analysis results suggest that these traits reflect the expected biology. Finally, we apply this knowledge to develop prediction models of gene expression traits for all tissues. The prediction models, heritability, and prediction performance R2 for original and decomposed expression phenotypes are made publicly available (https://github.com/hakyimlab/PrediXcan).

[1]  P. Visscher,et al.  Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets , 2016, Nature Genetics.

[2]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[3]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[4]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[5]  J. Marchini,et al.  Fast and accurate genotype imputation in genome-wide association studies through pre-phasing , 2012, Nature Genetics.

[6]  Gonçalo R. Abecasis,et al.  Minimac2: Faster Genotype Imputation , 2015, Bioinform..

[7]  Adan Valladares-Salgado,et al.  Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait. , 2014, American journal of human genetics.

[8]  Han Xu,et al.  Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. , 2014, American journal of human genetics.

[9]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[10]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[11]  Justin Zobel,et al.  Performance and Robustness of Penalized and Unpenalized Methods for Genetic Prediction of Complex Human Disease , 2013, Genetic epidemiology.

[12]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.

[13]  A. Long,et al.  Genetic Dissection of the Drosophila melanogaster Female Head Transcriptome Reveals Widespread Allelic Heterogeneity , 2014, PLoS genetics.

[14]  C. Wallace,et al.  Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics , 2013, PLoS genetics.

[15]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[16]  E. Dermitzakis,et al.  Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations , 2010, PLoS genetics.

[17]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[18]  M. Stephens,et al.  A Statistical Framework for Joint eQTL Analysis in Multiple Tissues , 2012, PLoS genetics.

[19]  Christopher D. Brown,et al.  Integrative Modeling of eQTLs and Cis-Regulatory Elements Suggests Mechanisms Underlying Cell Type Specificity of eQTLs , 2012, PLoS genetics.

[20]  D. Koller,et al.  Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals , 2013, Genome research.

[21]  Robert L. Grossman,et al.  The Design of a Community Science Cloud: The Open Science Data Cloud Perspective , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[22]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[23]  Peter Kraft,et al.  Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis , 2012, Nature Genetics.

[24]  Alan M. Kwong,et al.  A reference panel of 64,976 haplotypes for genotype imputation , 2015, Nature Genetics.

[25]  Tanya M. Teslovich,et al.  Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes , 2012, Nature Genetics.

[26]  Daniel Gianola,et al.  Predicting genetic predisposition in humans: the promise of whole-genome markers , 2010, Nature Reviews Genetics.

[27]  T. Lehtimäki,et al.  Integrative approaches for large-scale transcriptome-wide association studies , 2015, Nature Genetics.

[28]  R. Durbin,et al.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses , 2012, Nature Protocols.

[29]  Lude Franke,et al.  Mediation Analysis Demonstrates That Trans-eQTLs Are Often Explained by Cis-Mediation: A Genome-Wide Analysis among 1,800 South Asians , 2014, PLoS genetics.

[30]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[31]  Robert L. Grossman,et al.  Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets , 2014, J. Am. Medical Informatics Assoc..

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  L. Kruglyak,et al.  The role of regulatory variation in complex traits and disease , 2015, Nature Reviews Genetics.

[34]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[35]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[36]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[37]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[38]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[39]  Hae Kyung Im,et al.  Poly‐Omic Prediction of Complex Traits: OmicKriging , 2013, Genetic epidemiology.

[40]  Eran Segal,et al.  Robust Prediction of Expression Differences among Human Individuals Using Only Genotype Information , 2013, PLoS genetics.

[41]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[42]  Alkes L. Price,et al.  Single-Tissue and Cross-Tissue Heritability of Gene Expression Via Identity-by-Descent in Related or Unrelated Individuals , 2011, PLoS genetics.

[43]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[44]  Roby Joehanes,et al.  Identification of common genetic variants controlling transcript isoform variation in human whole blood , 2015, Nature Genetics.

[45]  Xia Yang,et al.  Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. , 2013, American journal of human genetics.

[46]  P. Sullivan,et al.  Heritability and Genomics of Gene Expression in Peripheral Blood , 2014, Nature Genetics.

[47]  Justin O Borevitz,et al.  Genetic architecture of regulatory variation in Arabidopsis thaliana. , 2011, Genome research.

[48]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[49]  Doug Speed,et al.  Improved heritability estimation from genome-wide SNPs. , 2012, American journal of human genetics.

[50]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[51]  O. Delaneau,et al.  A linear complexity phasing method for thousands of genomes , 2011, Nature Methods.

[52]  Joshua T. Burdick,et al.  Mapping determinants of human gene expression by regional and genome-wide association , 2005, Nature.

[53]  Benjamin D. Greenberg,et al.  Partitioning the Heritability of Tourette Syndrome and Obsessive Compulsive Disorder Reveals Differences in Genetic Architecture , 2013, PLoS genetics.

[54]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[55]  Christopher D. Brown,et al.  Identification, Replication, and Functional Fine-Mapping of Expression Quantitative Trait Loci in Primary Human Liver Tissue , 2011, PLoS genetics.