Saturation mutagenesis of disease-associated regulatory elements

The majority of common variants associated with common diseases, as well as an unknown proportion of causal mutations for rare diseases, fall in noncoding regions of the genome. Although catalogs of noncoding regulatory elements are steadily improving, we have a limited understanding of the functional effects of mutations within them. Here, we performed saturation mutagenesis in conjunction with massively parallel reporter assays on 20 disease-associated gene promoters and enhancers, generating functional measurements for over 30,000 single nucleotide substitution and deletion mutations. We find that the density of putative transcription factor binding sites varies widely between regulatory elements, as does the extent to which evolutionary conservation or various integrative scores predict functional effects. These data provide a powerful resource for interpreting the pathogenicity of clinically observed mutations in these disease-associated regulatory elements, and also comprise a gold-standard dataset for the further development of algorithms that aim to predict the regulatory effects of noncoding mutations.

[1]  Jun S. Song,et al.  Disruption of the β1L Isoform of GABP Reverses Glioblastoma Replicative Immortality in a TERT Promoter Mutation-Dependent Manner. , 2018, Cancer cell.

[2]  William S. DeWitt,et al.  A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility , 2018, Cell.

[3]  L. Chavez,et al.  Dynamic EBF1 occupancy directs sequential epigenetic and transcriptional events in B-cell programming , 2018, Genes & development.

[4]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[5]  William Stafford Noble,et al.  Segway 2.0: Gaussian mixture models and minibatch training , 2017, bioRxiv.

[6]  Pardis C Sabeti,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[7]  Manolis Kellis,et al.  Chromatin-state discovery and genome annotation with ChromHMM , 2017, Nature Protocols.

[8]  Nadav Ahituv,et al.  Gene Regulatory Elements, Major Drivers of Human Disease. , 2017, Annual review of genomics and human genetics.

[9]  Ian C. McDowell,et al.  Transversions have larger regulatory effects than transitions , 2017, BMC Genomics.

[10]  P. Stenson,et al.  The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies , 2017, Human Genetics.

[11]  Mingming Jia,et al.  COSMIC: somatic cancer genetics at high-resolution , 2016, Nucleic Acids Res..

[12]  Gao Wang,et al.  The impact of rare variation on gene expression across tissues , 2016, Nature.

[13]  A. Siepel,et al.  Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data , 2016, Nature Genetics.

[14]  Giorgio Valentini,et al.  A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease. , 2016, American journal of human genetics.

[15]  Dawei Xu,et al.  Cancer-Specific Telomerase Reverse Transcriptase (TERT) Promoter Mutations: Biological and Clinical Implications , 2016, Genes.

[16]  Michael T. McManus,et al.  A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity , 2016, bioRxiv.

[17]  I. Yeh Faculty Opinions recommendation of Cancer. The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer. , 2016 .

[18]  Jacob C. Ulirsch,et al.  Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits , 2016, Cell.

[19]  Eric S. Lander,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[20]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[21]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[22]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[23]  M. Sanson,et al.  TERT promoter mutations and rs2853669 polymorphism: prognostic impact and interactions with common alterations in glioblastomas , 2016, Journal of Neuro-Oncology.

[24]  J. Shendure,et al.  The origins, determinants, and consequences of human mutations , 2015, Science.

[25]  S. Spiegl-Kreinecker,et al.  Prognostic quality of activating TERT promoter mutations in glioblastoma: interaction with the rs2853669 polymorphism and patient age at diagnosis. , 2015, Neuro-oncology.

[26]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[27]  Chibo Hong,et al.  The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer , 2015, Science.

[28]  Benjamin J. Strober,et al.  A method to predict the impact of regulatory variants from DNA sequence , 2015, Nature Genetics.

[29]  Wei Wang,et al.  Recurrent TERT promoter mutations identified in a large-scale study of multiple tumour types are associated with increased TERT expression and telomerase activation. , 2015, European journal of cancer.

[30]  P. Flicek,et al.  The Ensembl Regulatory Build , 2015, Genome Biology.

[31]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..

[32]  A. Siepel,et al.  Probabilities of Fitness Consequences for Point Mutations Across the Human Genome , 2014, Nature Genetics.

[33]  K. Hoang-Xuan,et al.  TERT promoter mutations in gliomas, genetic associations and clinico-pathological correlations , 2014, British Journal of Cancer.

[34]  Kevin Y. Yip,et al.  FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer , 2014, Genome Biology.

[35]  E. Segal,et al.  In pursuit of design principles of regulatory sequences , 2014, Nature Reviews Genetics.

[36]  E. Zeggini,et al.  Functional annotation of non-coding sequence variants , 2014, Nature Methods.

[37]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[38]  M. Knowles,et al.  Comprehensive mutation analysis of the TERT promoter in bladder cancer and detection of mutations in voided urine. , 2014, European urology.

[39]  Manolis Kellis,et al.  Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments , 2013, Nucleic acids research.

[40]  Wayne M Patrick,et al.  Error-prone PCR and effective generation of gene variant libraries for directed evolution. , 2014, Methods in molecular biology.

[41]  D. Schadendorf,et al.  TERT promoter mutations in bladder cancer affect patient survival and disease recurrence through modification by a common polymorphism , 2013, Proceedings of the National Academy of Sciences.

[42]  Miguel Melo,et al.  Frequency of TERT promoter mutations in human cancers , 2013, Nature Communications.

[43]  Gary L. Gallia,et al.  TERT promoter mutations occur frequently in gliomas and a subset of tumors derived from cells with low rates of self-renewal , 2013, Proceedings of the National Academy of Sciences.

[44]  D. Schadendorf,et al.  TERT Promoter Mutations in Familial and Sporadic Melanoma , 2013, Science.

[45]  Lynda Chin,et al.  Highly Recurrent TERT Promoter Mutations in Human Melanoma , 2013, Science.

[46]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[47]  Joseph B Hiatt,et al.  Massively parallel functional dissection of mammalian enhancers in vivo , 2012, Nature Biotechnology.

[48]  Martin Kircher,et al.  Analysis of high-throughput ancient DNA sequencing data. , 2012, Methods in molecular biology.

[49]  Michael F. Melgar,et al.  Discovery of active enhancers through bidirectional expression of short transcripts , 2011, Genome Biology.

[50]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[51]  N. Plana,et al.  Functional analysis of LDLR promoter and 5′ UTR mutations in subjects with clinical diagnosis of familial hypercholesterolemia , 2011, Human mutation.

[52]  F. V. van Bockxmeer,et al.  Familial hypercholesterolemia: epidemiology, Neolithic origins and modern geographic distribution , 2011, Critical reviews in clinical laboratory sciences.

[53]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[54]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[55]  Olle Melander,et al.  From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus , 2010, Nature.

[56]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[57]  Rudolf Grosschedl,et al.  Early B cell factor 1 regulates B cell gene networks by activation, repression, and transcription- independent poising of chromatin. , 2010, Immunity.

[58]  Jay Shendure,et al.  Parallel, tag-directed assembly of locally derived short sequence reads , 2010, Nature Methods.

[59]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[60]  John C Chaput,et al.  Random mutagenesis by error-prone PCR. , 2010, Methods in molecular biology.

[61]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[62]  Jay Shendure,et al.  High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis , 2009, Nature Biotechnology.

[63]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[64]  K. Hemminki,et al.  A functional promoter polymorphism in the TERT gene does not affect inherited susceptibility to breast cancer. , 2009, Cancer genetics and cytogenetics.

[65]  Alberto Piazza,et al.  Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants , 2009, Nature Genetics.

[66]  M. Rieder,et al.  Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans (Nature Genetics (2008) 40, (189-197)) , 2008 .

[67]  S E Humphries,et al.  Update and Analysis of the University College London Low Density Lipoprotein Receptor Familial Hypercholesterolemia Database , 2008, Annals of human genetics.

[68]  A. Visel,et al.  Ultraconservation identifies a small subset of extremely constrained developmental enhancers , 2008, Nature Genetics.

[69]  Dolores Corella,et al.  Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans , 2008, Nature Genetics.

[70]  D. Goldstein,et al.  In vitro assays fail to predict in vivo effects of regulatory polymorphisms. , 2007, Human molecular genetics.

[71]  J. Kastelein,et al.  Update of the molecular basis of familial hypercholesterolemia in The Netherlands , 2005, Human mutation.

[72]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[73]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[74]  P. Toutouzas,et al.  Molecular characterization of familial hypercholesterolemia in German and Greek patients , 2004 .

[75]  C. Sutter,et al.  Eight novel MSH6 germline mutations in patients with familial and nonfamilial colorectal cancer selected by loss of protein expression in tumor tissue , 2004, Human mutation.

[76]  M. Eriksson,et al.  Genetic characterization of Swedish patients with familial hypercholesterolemia: a heterogeneous pattern of mutations in the LDL receptor gene. , 2002, Atherosclerosis.

[77]  R. Galetto,et al.  A mutation (-49C>T) in the promoter of the low density lipoprotein receptor gene associated with familial hypercholesterolemia. , 2002, Journal of lipid research.

[78]  A. Marais,et al.  Mutation -59c-->t in repeat 2 of the LDL receptor promoter: reduction in transcriptional activity and possible allelic interaction in a South African family with familial hypercholesterolaemia. , 1999, Human molecular genetics.

[79]  C. Luo,et al.  A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. , 1985, Molecular biology and evolution.