Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals

Understanding the consequences of regulatory variation in the human genome remains a major challenge, with important implications for understanding gene regulation and interpreting the many disease-risk variants that fall outside of protein-coding regions. Here, we provide a direct window into the regulatory consequences of genetic variation by sequencing RNA from 922 genotyped individuals. We present a comprehensive description of the distribution of regulatory variation--by the specific expression phenotypes altered, the properties of affected genes, and the genomic characteristics of regulatory variants. We detect variants influencing expression of over ten thousand genes, and through the enhanced resolution offered by RNA-sequencing, for the first time we identify thousands of variants associated with specific phenotypes including splicing and allelic expression. Evaluating the effects of both long-range intra-chromosomal and trans (cross-chromosomal) regulation, we observe modularity in the regulatory network, with three-dimensional chromosomal configuration playing a particular role in regulatory modules within each chromosome. We also observe a significant depletion of regulatory variants affecting central and critical genes, along with a trend of reduced effect sizes as variant frequency increases, providing evidence that purifying selection and buffering have limited the deleterious impact of regulatory variation on the cell. Further, generalizing beyond observed variants, we have analyzed the genomic properties of variants associated with expression and splicing and developed a Bayesian model to predict regulatory consequences of genetic variants, applicable to the interpretation of individual genomes and disease studies. Together, these results represent a critical step toward characterizing the complete landscape of human regulatory variation.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Martin J. Lercher,et al.  Clustering of housekeeping genes provides a unified model of gene order in the human genome , 2002, Nature Genetics.

[3]  Bert Vogelstein,et al.  Allelic Variation in Human Gene Expression , 2002, Science.

[4]  A. E. Hirsh,et al.  Evolutionary Rate in the Protein Interaction Network , 2002, Science.

[5]  Mike Tyers,et al.  The GRID: The General Repository for Interaction Datasets , 2003, Genome Biology.

[6]  M. Nóbrega,et al.  Scanning Human Gene Deserts for Long-Range Enhancers , 2003, Science.

[7]  E. Koonin,et al.  Conservation and coevolution in the scale-free human gene coexpression network. , 2004, Molecular biology and evolution.

[8]  Richard A Flavell,et al.  Long-range intrachromosomal interactions in the T helper type 2 cytokine locus , 2004, Nature Immunology.

[9]  Thomas J. Hudson,et al.  Cis-Acting Regulatory Variation in the Human Genome , 2004, Science.

[10]  D. Kleinjan,et al.  Long-range control of gene expression: emerging mechanisms and disruption in disease. , 2005, American journal of human genetics.

[11]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[12]  Tom Misteli,et al.  A non-random walk through the genome , 2005, Genome Biology.

[13]  L. Almasy,et al.  Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes , 2007, Nature Genetics.

[14]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[15]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[16]  Mark Gerstein,et al.  Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation , 2006, Nucleic Acids Res..

[17]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[18]  Jacek Majewski,et al.  Genome-wide analysis of transcript isoform variation in humans , 2008, Nature Genetics.

[19]  Scott A. Rifkin,et al.  Revealing the architecture of gene regulation: the promise of eQTL studies. , 2008, Trends in genetics : TIG.

[20]  Thomas J. Hudson,et al.  Differential Allelic Expression in the Human Genome: A Robust Approach To Identify Genetic and Epigenetic Cis-Acting Mechanisms Regulating Gene Expression , 2008, PLoS genetics.

[21]  K. Deichmann,et al.  Effect of Variation in CHI 3 L 1 on Serum YKL-40 Level , Risk of Asthma , and Lung Function , 2008 .

[22]  Ying Sun,et al.  Effect of variation in CHI3L1 on serum YKL-40 level, risk of asthma, and lung function. , 2008, The New England journal of medicine.

[23]  W. Bodmer,et al.  Common and rare variants in multifactorial susceptibility to common diseases , 2008, Nature Genetics.

[24]  H. Stefánsson,et al.  Genetics of gene expression and its effect on disease , 2008, Nature.

[25]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[26]  P. Deloukas,et al.  Common Regulatory Variation Impacts Gene Expression in a Cell Type–Dependent Manner , 2009, Science.

[27]  David A. Drubin,et al.  Learning a Prior on Regulatory Potential from eQTL Data , 2009, PLoS genetics.

[28]  H. Fraser,et al.  Common polymorphic transcript variation in human disease. , 2009, Genome research.

[29]  K. Dewar,et al.  Targeted screening of cis-regulatory variation in human haplotypes. , 2008, Genome research.

[30]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[31]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[32]  Christopher B. Miller,et al.  Deletion of IKZF1 and prognosis in acute lymphoblastic leukemia. , 2009, The New England journal of medicine.

[33]  Jehyuk Lee,et al.  Digital RNA Allelotyping Reveals Tissue-specific and Allele-specific Gene Expression in Human , 2009, Nature Methods.

[34]  Junjun Zhang,et al.  BioMart Central Portal—unified access to biological data , 2009, Nucleic Acids Res..

[35]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[36]  Mathieu Blanchette,et al.  Global patterns of cis variation in human cells revealed by high-density allelic expression analysis , 2009, Nature Genetics.

[37]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[38]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[39]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[40]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[41]  E. Dermitzakis,et al.  Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations , 2010, PLoS genetics.

[42]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[43]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[44]  Avi Ma'ayan,et al.  ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments , 2010, Bioinform..

[45]  R. Spielman,et al.  Polymorphic Cis- and Trans-Regulation of Human Gene Expression , 2010, PLoS biology.

[46]  Cole Trapnell,et al.  Improving RNA-Seq expression estimates by correcting for fragment bias , 2011, Genome Biology.

[47]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[48]  Daniel Rios,et al.  Bioinformatics Applications Note Databases and Ontologies Deriving the Consequences of Genomic Variants with the Ensembl Api and Snp Effect Predictor , 2022 .

[49]  H. Stunnenberg,et al.  PML-RARalpha/RXR Alters the Epigenetic Landscape in Acute Promyelocytic Leukemia. , 2010, Cancer cell.

[50]  Christian Gieger,et al.  New gene functions in megakaryopoiesis and platelet formation , 2011, Nature.

[51]  Qianqian Zhu,et al.  A genome-wide comparison of the functional properties of rare and common genetic variants in humans. , 2011, American journal of human genetics.

[52]  Christian Gieger,et al.  Genetic Variants in Novel Pathways Influence Blood Pressure and Cardiovascular Disease Risk , 2011, Nature.

[53]  D. G. Clark,et al.  Common variants in MS4A4/MS4A6E, CD2uAP, CD33, and EPHA1 are associated with late-onset Alzheimer’s disease , 2011, Nature Genetics.

[54]  J. Shendure,et al.  Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data , 2011, Nature Reviews Genetics.

[55]  Matthew Stephens,et al.  Dissecting the regulatory architecture of gene expression QTLs , 2012, Genome Biology.

[56]  Jingyuan Fu,et al.  Trans-eQTLs Reveal That Independent Genetic Variants Associated with a Complex Phenotype Converge on Intermediate Genes, with a Major Role for the HLA , 2011, PLoS genetics.

[57]  F. Vannberg,et al.  GENETICS OF GENE EXPRESSION IN PRIMARY IMMUNE CELLS IDENTIFIES CELL-SPECIFIC MASTER REGULATORS AND ROLES OF HLA ALLELES , 2012, Nature Genetics.

[58]  Joseph K. Pickrell,et al.  DNaseI sensitivity QTLs are a major determinant of human expression variation , 2011, Nature.

[59]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[60]  R. Guigó,et al.  Estimation of alternative splicing variability in human populations. , 2012, Genome research.

[61]  A. Pandey,et al.  Human Protein Reference Database and Human Proteinpedia as resources for phosphoproteome analysis. , 2012, Molecular bioSystems.

[62]  Simon C. Potter,et al.  Mapping cis- and trans-regulatory effects across multiple tissues in twins , 2012, Nature Genetics.

[63]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[64]  Joseph K. Pickrell,et al.  Exon-Specific QTLs Skew the Inferred Distribution of Expression QTLs Detected Using Gene Expression Array Data , 2012, PloS one.

[65]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[66]  S. Batzoglou,et al.  Linking disease associations with regulatory information in the human genome , 2012, Genome research.

[67]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[68]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[69]  Eurie L. Hong,et al.  Annotation of functional variation in personal genomes using RegulomeDB , 2012, Genome research.

[70]  Martin Renqiang Min,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[71]  Daphne Koller,et al.  Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge , 2013, PloS one.

[72]  Liming Liang,et al.  A cross-platform analysis of 14,177 expression quantitative trait loci derived from lymphoblastoid cell lines , 2013, Genome research.