Genetic architecture of gene expression traits across diverse populations

For many complex traits, gene regulation is likely to play a crucial mechanistic role. How the genetic architectures of complex traits vary between populations and subsequent effects on genetic prediction are not well understood, in part due to the historical paucity of GWAS in populations of non-European ancestry. We used data from the MESA (Multi-Ethnic Study of Atherosclerosis) cohort to characterize the genetic architecture of gene expression within and between diverse populations. Genotype and monocyte gene expression were available in individuals with African American (AFA, n = 233), Hispanic (HIS, n = 352), and European (CAU, n = 578) ancestry. We performed expression quantitative trait loci (eQTL) mapping in each population and show genetic correlation of gene expression depends on shared ancestry proportions. Using elastic net modeling with cross validation to optimize genotypic predictors of gene expression in each population, we show the genetic architecture of gene expression for most predictable genes is sparse. We found the best predicted gene in each population, TACSTD2 in AFA and CHURC1 in CAU and HIS, had similar prediction performance across populations with R2 > 0.8 in each population. However, we identified a subset of genes that are well-predicted in one population, but poorly predicted in another. We show these differences in predictive performance are due to allele frequency differences between populations. Using genotype weights trained in MESA to predict gene expression in independent populations showed that a training set with ancestry similar to the test set is better at predicting gene expression in test populations, demonstrating an urgent need for diverse population sampling in genomics. Our predictive models and performance statistics in diverse cohorts are made publicly available for use in transcriptome mapping methods at https://github.com/WheelerLab/DivPop.

[1]  Joseph K. Pickrell,et al.  Detection and interpretation of shared genetic influences on 42 human traits , 2015, Nature Genetics.

[2]  Hae Kyung Im,et al.  Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues , 2016, bioRxiv.

[3]  S. Fullerton,et al.  Genomics is failing on diversity , 2016, Nature.

[4]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[5]  Han Xu,et al.  Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. , 2014, American journal of human genetics.

[6]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[7]  E. Dermitzakis,et al.  Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations , 2010, PLoS genetics.

[8]  D. Koller,et al.  Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals , 2013, Genome research.

[9]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[10]  Derek E. Kelly,et al.  Global variation in gene expression and the value of diverse sampling. , 2017, Current opinion in systems biology.

[11]  Sang Hong Lee,et al.  Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood , 2012, Bioinform..

[12]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[13]  Brielin C. Brown,et al.  Transethnic genetic correlation estimates from summary statistics , 2016, bioRxiv.

[14]  Todd L Edwards,et al.  Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics , 2018, Nature Communications.

[15]  Roby Joehanes,et al.  Identification of common genetic variants controlling transcript isoform variation in human whole blood , 2015, Nature Genetics.

[16]  Gabor T. Marth,et al.  Demographic history and rare allele sharing among human populations , 2011, Proceedings of the National Academy of Sciences.

[17]  Xiang Zhou,et al.  Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models , 2017, Nature Communications.

[18]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[19]  P. Visscher,et al.  Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets , 2016, Nature Genetics.

[20]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[21]  L. Kruglyak,et al.  The role of regulatory variation in complex traits and disease , 2015, Nature Reviews Genetics.

[22]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[23]  Christopher R. Gignoux,et al.  Human demographic history impacts genetic risk prediction across diverse populations , 2016, bioRxiv.

[24]  R. Kronmal,et al.  Multi-Ethnic Study of Atherosclerosis: objectives and design. , 2002, American journal of epidemiology.

[25]  Donald W. Bowden,et al.  Mapping adipose and muscle tissue expression quantitative trait loci in African Americans to identify genes for type 2 diabetes and obesity , 2016, Human Genetics.

[26]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[27]  Adan Valladares-Salgado,et al.  Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait. , 2014, American journal of human genetics.

[28]  David A. Knowles,et al.  RNA splicing is a primary link between genetic variation and disease , 2016, Science.

[29]  A. Need,et al.  Next generation disparities in human genomics: concerns and remedies. , 2009, Trends in genetics : TIG.

[30]  P. Deloukas,et al.  Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans , 2017, Genome Medicine.

[31]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Shane A. McCarthy,et al.  Reference-based phasing using the Haplotype Reference Consortium panel , 2016, Nature Genetics.

[33]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[34]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[35]  D. Jacobs,et al.  Methylomics of gene expression in human monocytes. , 2013, Human molecular genetics.

[36]  Benjamin D. Greenberg,et al.  Partitioning the Heritability of Tourette Syndrome and Obsessive Compulsive Disorder Reveals Differences in Genetic Architecture , 2013, PLoS genetics.

[37]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[38]  Eran Segal,et al.  Robust Prediction of Expression Differences among Human Individuals Using Only Genotype Information , 2013, PLoS genetics.

[39]  Yara T. E. Lechanteur,et al.  Nature Genetics Advance Online Publication , 2022 .

[40]  N. Powe,et al.  Diversity in Clinical and Biomedical Research: A Promise Yet to Be Fulfilled , 2015, bioRxiv.

[41]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[42]  Francisco M. De La Vega,et al.  Genomics for the world , 2011, Nature.

[43]  T. Lehtimäki,et al.  Integrative approaches for large-scale transcriptome-wide association studies , 2015, Nature Genetics.

[44]  Diversity in Clinical and Biomedical Research: A promise yet to be fulfilled , 2015, bioRxiv.

[45]  Manuel A. R. Ferreira,et al.  Multiancestry association study identifies new asthma risk loci that colocalize with immune cell enhancer marks , 2017, Nature Genetics.

[46]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[47]  R. Durbin,et al.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses , 2012, Nature Protocols.

[48]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[49]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[50]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[51]  C. Carlson,et al.  Generalization and Dilution of Association Results from European GWAS in Populations of Non-European Ancestry: The PAGE Study , 2013, PLoS biology.