Systematic identification of cis-regulatory variants that cause gene expression differences in a yeast cross

Sequence variation in regulatory DNA alters gene expression and shapes genetically complex traits. However, the identification of individual, causal regulatory variants is challenging. Here, we used a massively parallel reporter assay to measure the cis-regulatory consequences of 5832 natural DNA variants in the promoters of 2503 genes in the yeast Saccharomyces cerevisiae. We identified 451 causal variants, which underlie genetic loci known to affect gene expression. Several promoters harbored multiple causal variants. In five promoters, pairs of variants showed non-additive, epistatic interactions. Causal variants were enriched at conserved nucleotides, tended to have low derived allele frequency, and were depleted from promoters of essential genes, which is consistent with the action of negative selection. Causal variants were also enriched for alterations in transcription factor binding sites. Models integrating these features provided modest, but statistically significant, ability to predict causal variants. This work revealed a complex molecular basis for cis-acting regulatory variation.

[1]  Z. Yakhini,et al.  Systematic discovery of cap-independent translation sequences in human and viral genomes , 2016, Science.

[2]  Sriram Kosuri,et al.  Causes and Effects of N-Terminal Codon Bias in Bacterial Genes , 2013, Science.

[3]  Benjamin J. Strober,et al.  A method to predict the impact of regulatory variants from DNA sequence , 2015, Nature Genetics.

[4]  H. Bussemaker,et al.  High-throughput identification of human SNPs affecting regulatory element activity , 2019, Nature Genetics.

[5]  Thomas Mitchell-Olds,et al.  Epistasis and balanced polymorphism influencing complex trait variation , 2005, Nature.

[6]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[7]  Michael T. McManus,et al.  A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity , 2016, bioRxiv.

[8]  Z. Yakhini,et al.  Systematic Dissection of the Sequence Determinants of Gene 3’ End Mediated Expression Control , 2015, PLoS genetics.

[9]  Barak A. Cohen,et al.  Complex effects of nucleotide variants in a mammalian cis-regulatory element , 2012, Proceedings of the National Academy of Sciences.

[10]  Himanshu Sinha,et al.  Sequential Elimination of Major-Effect Contributors Identifies Additional Quantitative Trait Loci Conditioning High-Temperature Growth in Yeast , 2008, Genetics.

[11]  J. Derisi,et al.  Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise , 2006, Nature.

[12]  Adam Paul Arkin,et al.  Evaluation of 244,000 synthetic sequences reveals design principles to optimize translation in Escherichia coli , 2018, Nature Biotechnology.

[13]  F. Winston,et al.  Chromatin and Transcription in Yeast , 2012, Genetics.

[14]  J. Dougherty,et al.  The Oft-Overlooked Massively Parallel Reporter Assay: Where, When, and Which Psychiatric Genetic Variants are Functional? , 2020, bioRxiv.

[15]  Chandra L. Theesfeld,et al.  Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk , 2018, Nature Genetics.

[16]  D. Koller,et al.  Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals , 2013, Genome research.

[17]  Jonathan K. Pritchard,et al.  Identification of Genetic Variants That Affect Histone Modifications in Human Cells , 2013, Science.

[18]  Vanessa E. Gray,et al.  Multiplex Assessment of Protein Variant Abundance by Massively Parallel Sequencing , 2018, Nature Genetics.

[19]  W. G. Hill,et al.  Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits , 2008, PLoS genetics.

[20]  Alkes L. Price,et al.  Quantifying genetic effects on disease mediated by assayed gene expression levels , 2019, Nature Genetics.

[21]  Amos Tanay,et al.  Extensive low-affinity transcriptional interactions in the yeast genome. , 2006, Genome research.

[22]  Olle Melander,et al.  From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus , 2010, Nature.

[23]  Eran Segal,et al.  Core promoter sequence in yeast is a major determinant of expression level , 2015, Genome research.

[24]  Leonid Kruglyak,et al.  Accounting for genetic interactions improves modeling of individual quantitative trait phenotypes in yeast , 2016, Nature Genetics.

[25]  Yuhua Zhao,et al.  Genome sequencing and genetic breeding of a bioethanol Saccharomyces cerevisiae strain YJS329 , 2012, BMC Genomics.

[26]  Hunter B. Fraser,et al.  The Molecular Mechanism of a Cis-Regulatory Adaptation in Yeast , 2013, PLoS genetics.

[27]  Joseph K. Pickrell,et al.  DNaseI sensitivity QTLs are a major determinant of human expression variation , 2011, Nature.

[28]  P. Cramer,et al.  The interaction landscape between transcription factors and the nucleosome , 2017, Nature.

[29]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[30]  R. Schiestl,et al.  High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method , 2007, Nature Protocols.

[31]  Schraga Schwartz,et al.  TRUB1 is the predominant pseudouridine synthase acting on mammalian mRNA via a predictable and conserved code. , 2017, Genome research.

[32]  Hunter B. Fraser,et al.  High-resolution mapping of cis-regulatory variation in budding yeast , 2017, Proceedings of the National Academy of Sciences.

[33]  Eran Segal,et al.  Deciphering the rules by which 5′-UTR sequences affect protein expression in yeast , 2013, Proceedings of the National Academy of Sciences.

[34]  L. Mirny,et al.  Different gene regulation strategies revealed by analysis of binding motifs. , 2009, Trends in genetics : TIG.

[35]  Georg Seelig,et al.  Learning the Sequence Determinants of Alternative Splicing from Millions of Random Sequences , 2015, Cell.

[36]  Henry Horng-Shing Lu,et al.  Natural selection on cis and trans regulation in yeasts. , 2010, Genome research.

[37]  Jacob C. Ulirsch,et al.  Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits , 2016, Cell.

[38]  Beth K. Martin,et al.  Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution , 2019, Nature Communications.

[39]  Łukasz M. Boryń,et al.  Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq , 2013, Science.

[40]  L. Kruglyak,et al.  Selection at Linked Sites Shapes Heritable Phenotypic Variation in C. elegans , 2010, Science.

[41]  A. Schier,et al.  A Massively Parallel Reporter Assay of 3' UTR Sequences Identifies In Vivo Rules for mRNA Degradation. , 2018, Molecular cell.

[42]  E. M. Jones,et al.  Dissection of c-AMP Response Element Architecture by Using Genomic and Episomal Massively Parallel Reporter Assays. , 2020, Cell systems.

[43]  Paul D Piehowski,et al.  Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes , 2016, eLife.

[44]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[45]  L. Steinmetz,et al.  Extensive transcriptional heterogeneity revealed by isoform profiling , 2013, Nature.

[46]  E. Segal,et al.  Systematic interrogation of human promoters , 2019, Genome research.

[47]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[48]  L. Kruglyak,et al.  The role of regulatory variation in complex traits and disease , 2015, Nature Reviews Genetics.

[49]  Vivek K. Mutalik,et al.  Composability of regulatory sequences controlling transcription and translation in Escherichia coli , 2013, Proceedings of the National Academy of Sciences.

[50]  Nir Yosef,et al.  Identification and Massively Parallel Characterization of Regulatory Elements Driving Neural Induction. , 2019, Cell stem cell.

[51]  L. Kruglyak,et al.  Genetics of single-cell protein abundance variation in large yeast populations , 2013, Nature.

[52]  P. Wittkopp,et al.  Compensatory trans‐regulatory alleles minimizing variation in TDH3 expression are common within Saccharomyces cerevisiae , 2019, Evolution letters.

[53]  Yuwen Liu,et al.  Systematic identification of regulatory variants associated with cancer risk , 2017, Genome Biology.

[54]  F. W. Albert,et al.  Simultaneous quantification of mRNA and protein in single cells reveals post-transcriptional effects of genetic variation , 2020, bioRxiv.

[55]  Kohske Takahashi,et al.  Welcome to the Tidyverse , 2019, J. Open Source Softw..

[56]  E. M. Jones,et al.  A Multiplexed Assay for Exon Recognition Reveals that an Unappreciated Fraction of Rare Genetic Variants Cause Large-Effect Splicing Disruptions. , 2019, Molecular cell.

[57]  Ronald M. Nelson,et al.  Genetic Influences on Brain Gene Expression in Rats Selected for Tameness and Aggression , 2019 .

[58]  J. Shendure,et al.  Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model , 2013, Nature Genetics.

[59]  Kristin R Brogaard,et al.  A base pair resolution map of nucleosome positions in yeast , 2012, Nature.

[60]  Jeffrey M. Skerker,et al.  QTL-guided metabolic engineering of a complex trait , 2016, bioRxiv.

[61]  B. Cohen,et al.  Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants , 2013, Genome research.

[62]  Gary D. Stormo,et al.  ScerTF: a comprehensive database of benchmarked position weight matrices for Saccharomyces species , 2011, Nucleic Acids Res..

[63]  J. Shendure,et al.  Functional testing of thousands of osteoarthritis-associated variants for regulatory activity , 2018, Nature Communications.

[64]  B. Cohen,et al.  High-throughput functional testing of ENCODE segmentation predictions , 2014, Genome research.

[65]  S. Vavrus,et al.  The influence of Arctic amplification on mid-latitude summer circulation , 2018, Nature Communications.

[66]  B. Cohen,et al.  A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activity , 2018, Nature Biotechnology.

[67]  Michael A. Kovacs,et al.  Massively parallel reporter assays combined with cell-type specific eQTL informed multiple melanoma loci and identified a pleiotropic function of HIV-1 restriction gene, MX2, in melanoma promotion , 2019, bioRxiv.

[68]  Leonid Kruglyak,et al.  Genetic Influences on Translation in Yeast , 2014, bioRxiv.

[69]  Stefan Engelen,et al.  Genome evolution across 1,011 Saccharomyces cerevisiae isolates , 2018, Nature.

[70]  Eric S. Lander,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[71]  L. Kruglyak,et al.  Finding the sources of missing heritability in a yeast cross , 2012, Nature.

[72]  Jake Siegel,et al.  Genetics of trans-regulatory variation in gene expression , 2017, bioRxiv.

[73]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[74]  N. Jojic,et al.  Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences , 2017, bioRxiv.

[75]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[76]  Christopher D. Brown,et al.  Large, Diverse Population Cohorts of hiPSCs and Derived Hepatocyte-like Cells Reveal Functional Genetic Variation at Blood Lipid-Associated Loci. , 2017, Cell stem cell.

[77]  S. Nuzhdin,et al.  The Evolution of Gene Expression in cis and trans. , 2018, Trends in genetics : TIG.

[78]  Martha L. Bulyk,et al.  Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos , 2013, Nature Methods.

[79]  Sarah M. Goggin,et al.  High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human , 2018, Nature Communications.

[80]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[81]  T. Mikkelsen,et al.  Rapid dissection and model-based optimization of inducible enhancers in human cells using a massively parallel reporter assay , 2012, Nature Biotechnology.

[82]  David Haussler,et al.  The UCSC Genome Browser database: 2019 update , 2018, Nucleic Acids Res..

[83]  Dan Xie,et al.  Extensive Variation in Chromatin States Across Humans , 2013, Science.

[84]  A. Siepel,et al.  Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data , 2016, Nature Genetics.

[85]  S. Hunt,et al.  Genome-Wide Associations of Gene Expression Variation in Humans , 2005, PLoS genetics.

[86]  Leonid Kruglyak,et al.  Genetic interactions contribute less than additive effects to quantitative trait variation in yeast , 2015, Nature Communications.

[87]  Andrew G. Clark,et al.  Evolutionary changes in cis and trans gene regulation , 2004, Nature.

[88]  A. Schier,et al.  A Massively Parallel Reporter Assay of 3' UTR Sequences Identifies In Vivo Rules for mRNA Degradation. , 2017, Molecular cell.

[89]  M. Grunstein,et al.  Nucleosome loss activates yeast downstream promoters in vivo , 1988, Cell.

[90]  Leonid Kruglyak,et al.  Local Regulatory Variation in Saccharomyces cerevisiae , 2005, PLoS genetics.

[91]  Carl G. de Boer,et al.  Deciphering eukaryotic cis-regulatory logic with 100 million random promoters , 2017, bioRxiv.

[92]  E. Stone,et al.  The genetics of quantitative traits: challenges and prospects , 2009, Nature Reviews Genetics.

[93]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[94]  Leighton J. Core,et al.  Coordinated Effects of Sequence Variation on DNA Binding, Chromatin Structure, and Transcription , 2013, Science.

[95]  R. Doerge,et al.  Global eQTL Mapping Reveals the Complex Genetic Architecture of Transcript-Level Variation in Arabidopsis , 2007, Genetics.

[96]  J. Dougherty,et al.  Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts , 2020, Biological Psychiatry.

[97]  Benjamin J. Kotopka,et al.  Model-driven generation of artificial yeast promoters , 2019, Nature Communications.

[98]  K. Hansen,et al.  Linear models enable powerful differential activity analysis in massively parallel reporter assays , 2017, BMC Genomics.

[99]  A. Boyle,et al.  Mining the Unknown: Assigning Function to Noncoding Single Nucleotide Polymorphisms. , 2017, Trends in genetics : TIG.

[100]  Arnaud R Krebs,et al.  High-throughput engineering of a mammalian genome reveals building principles of methylation states at CG rich regions , 2014, eLife.

[101]  Greg Gibson,et al.  Biological relevance of computationally predicted pathogenicity of noncoding variants , 2019, Nature Communications.

[102]  T. Mikkelsen,et al.  Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. , 2013, Genome research.

[103]  Adam P. Rosebrock,et al.  A global genetic interaction network maps a wiring diagram of cellular function , 2016, Science.

[104]  F. W. Albert,et al.  DNA variants affecting the expression of numerous genes in trans have diverse mechanisms of action and evolutionary histories , 2019, bioRxiv.

[105]  Jay Shendure,et al.  High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis , 2009, Nature Biotechnology.

[106]  B. Pugh,et al.  Identification and Distinct Regulation of Yeast TATA Box-Containing Genes , 2004, Cell.

[107]  Daniel R. Richards,et al.  Dissecting the architecture of a quantitative trait locus in yeast , 2002, Nature.

[108]  J. Stinchcombe,et al.  Association mapping reveals the role of purifying selection in the maintenance of genomic variation in gene expression , 2015, Proceedings of the National Academy of Sciences.

[109]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[110]  T. Taniguchi,et al.  Site-selectively generated photon emitters in monolayer MoS2 via local helium ion irradiation , 2019, Nature Communications.

[111]  Z. Yakhini,et al.  Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters , 2012, Nature Biotechnology.

[112]  R. Murdey,et al.  Sn(IV)-free tin perovskite films realized by in situ Sn(0) nanoparticle treatment of the precursor solution , 2020, Nature Communications.

[113]  M. Daly,et al.  Genetic and Epigenetic Fine-Mapping of Causal Autoimmune Disease Variants , 2014, Nature.

[114]  Edith D. Wong,et al.  The Reference Genome Sequence of Saccharomyces cerevisiae: Then and Now , 2013, G3: Genes, Genomes, Genetics.

[115]  Wei-Sheng Wu,et al.  The spatial distribution of cis regulatory elements in yeast promoters and its implications for transcriptional regulation , 2010, BMC Genomics.

[116]  William H. Majoros,et al.  Massively parallel quantification of the regulatory effects of noncoding genetic variation in a human cohort , 2015, Genome research.

[117]  J. Kinney,et al.  Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence , 2010, Proceedings of the National Academy of Sciences.

[118]  Jiajie Zhang,et al.  PEAR: a fast and accurate Illumina Paired-End reAd mergeR , 2013, Bioinform..

[119]  J. Akey,et al.  The Evolution of Gene Expression QTL in Saccharomyces cerevisiae , 2007, PloS one.

[120]  Pedro T. Monteiro,et al.  YEASTRACT+: a portal for cross-species comparative genomics of transcription regulation in yeasts , 2019, Nucleic Acids Res..

[121]  John D. Storey,et al.  Genetic interactions between polymorphisms that affect gene expression in yeast , 2005, Nature.

[122]  Manolis Kellis,et al.  FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. , 2015, The New England journal of medicine.

[123]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[124]  K. Thorn,et al.  Optimized cassettes for fluorescent protein tagging in Saccharomyces cerevisiae , 2004, Yeast.

[125]  Daniel J. Kvitek,et al.  Transient Genotype-by-Environment Interactions Following Environmental Shock Provide a Source of Expression Variation for Essential Genes , 2010, Genetics.