Loss-of-function tolerance of enhancers in the human genome

Previous studies have surveyed the potential impact of loss-of-function (LoF) variants and identified LoF-tolerant protein-coding genes. However, the tolerance of human genomes to losing enhancers has not yet been evaluated. Here we present the catalog of LoF-tolerant enhancers using structural variants from whole-genome sequences. Using a conservative approach, we estimate that each individual human genome possesses at least 28 LoF-tolerant enhancers on average. We assessed the properties of LoF-tolerant enhancers in a unified regulatory network constructed by integrating tissue-specific enhancers and gene-gene interactions. We find that LoF-tolerant enhancers are more tissue-specific and regulate fewer and more dispensable genes. They are enriched in immune-related cells while LoF-intolerant enhancers are enriched in kidney and brain/neuronal stem cells. We developed a supervised learning approach to predict the LoF-tolerance of enhancers, which achieved an AUROC of 96%. We predict 5,677 more enhancers would be likely tolerant to LoF and 75 enhancers that would be highly LoF-intolerant. Our predictions are supported by known set of disease enhancers and novel deletions from PacBio sequencing. The LoF-tolerance scores provided here will serve as an important reference for disease studies.

[1]  R. Pfundt,et al.  De Novo Variants Disturbing the Transactivation Capacity of POU3F3 Cause a Characteristic Neurodevelopmental Disorder. , 2019, American journal of human genetics.

[2]  D. Goldstein,et al.  Noncoding Deletions Expose a Novel Gene Critical for Intestinal Function , 2019, Nature.

[3]  May E. Montasser,et al.  Leveraging linkage evidence to identify low-frequency and rare variants on 16p13 associated with blood pressure using TOPMed whole genome sequencing data , 2019, Human Genetics.

[4]  Gregory M. Cooper,et al.  CADD: predicting the deleteriousness of variants throughout the human genome , 2018, Nucleic Acids Res..

[5]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[6]  T. Meitinger,et al.  A De Novo Missense Variant in POU3F2 Identified in a Child with Global Developmental Delay , 2018, Neuropediatrics.

[7]  David Haussler,et al.  High-resolution comparative analysis of great ape genomes , 2018, Science.

[8]  The 100 000 Genomes Project: bringing whole genome sequencing to the NHS , 2018, British Medical Journal.

[9]  Damian Smedley,et al.  The 100 000 Genomes Project: bringing whole genome sequencing to the NHS , 2018, British Medical Journal.

[10]  K. Bhatia,et al.  Solving Mendelian Mysteries: The Non-coding Genome May Hold the Key , 2018, Cell.

[11]  Tyler H. Garvin,et al.  Ultraconserved Enhancers Are Required for Normal Development , 2018, Cell.

[12]  M. Fornage,et al.  Whole genome sequence analyses of brain imaging measures in the Framingham Study , 2018, Neurology.

[13]  K. Ohno,et al.  Rare loss of function mutations in N-methyl-d-aspartate glutamate receptors and their contributions to schizophrenia susceptibility , 2018, Translational Psychiatry.

[14]  Tyler H. Garvin,et al.  Enhancer Redundancy Allows for Phenotypic Robustness in Mammalian Development , 2017, Nature.

[15]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[16]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[17]  J. Michael Cherry,et al.  The Encyclopedia of DNA elements (ENCODE): data portal update , 2017, Nucleic Acids Res..

[18]  Zhen Wang,et al.  HEDD: Human Enhancer Disease Database , 2017, Nucleic Acids Res..

[19]  Xia Li,et al.  DiseaseEnhancer: a resource of human disease-associated enhancer catalog , 2017, Nucleic Acids Res..

[20]  Anders M. Dale,et al.  Precision medicine screening using whole-genome sequencing and advanced imaging to identify disease risk in adults , 2017, Proceedings of the National Academy of Sciences.

[21]  Ryan L. Collins,et al.  Multi-platform discovery of haplotype-resolved structural variation in human genomes , 2017, bioRxiv.

[22]  Kevin Y. Yip,et al.  Reconstruction of enhancer–target networks in 935 samples of human primary cells, tissues and cell lines , 2017, Nature Genetics.

[23]  Iuliana Ionita-Laza,et al.  FUN-LDA: A LATENT DIRICHLET ALLOCATION MODEL FOR PREDICTING TISSUE-SPECIFIC FUNCTIONAL EFFECTS OF NONCODING VARIATION , 2016, bioRxiv.

[24]  Thomas Meitinger,et al.  Genetic diagnosis of Mendelian disorders via RNA sequencing , 2017, Nature Communications.

[25]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[26]  Giulio Genovese,et al.  Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia , 2016, Nature Neuroscience.

[27]  Levi C. T. Pierce,et al.  Deep sequencing of 10,000 human genomes , 2016, Proceedings of the National Academy of Sciences.

[28]  K. Pollard,et al.  Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin , 2016, Nature Genetics.

[29]  Wei Wang,et al.  Constructing 3D interaction maps from 1D epigenomes , 2016, Nature Communications.

[30]  Alireza F. Siahpirani,et al.  A predictive modeling approach for cell line-specific long-range regulatory interactions , 2015, Nucleic acids research.

[31]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[32]  Alireza F. Siahpirani,et al.  A predictive modeling approach for cell line-specific long-range regulatory interactions , 2015, Nucleic acids research.

[33]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[34]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[35]  Zeba Wunderlich,et al.  Krüppel Expression Levels Are Maintained through Compensatory Evolution of Shadow Enhancers. , 2015, Cell reports.

[36]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[37]  Karynne E. Patterson,et al.  The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. , 2015, American journal of human genetics.

[38]  Marc Vidal,et al.  Human Gene-Centered Transcription Factor Networks for Enhancers and Disease Variants , 2015, Cell.

[39]  Richard J. Rodenburg,et al.  Whole exome sequencing of suspected mitochondrial patients in clinical practice , 2015, Journal of Inherited Metabolic Disease.

[40]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[41]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[42]  Kevin Y. Yip,et al.  FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer , 2014, Genome Biology.

[43]  L. Vissers,et al.  Genome sequencing identifies major causes of severe intellectual disability , 2014, Nature.

[44]  He Zhang,et al.  Loss-of-function mutations in APOC3, triglycerides, and coronary disease. , 2014, The New England journal of medicine.

[45]  M. Guindani,et al.  Functional analysis of limb transcriptional enhancers in the mouse , 2014, Evolution & development.

[46]  Nancy F. Hansen,et al.  An enhancer polymorphism at the cardiomyocyte intercalated disc protein NOS1AP locus is a major regulator of the QT interval. , 2014, American journal of human genetics.

[47]  K. Tan,et al.  Global view of enhancer–promoter interactome in human cells , 2014, Proceedings of the National Academy of Sciences.

[48]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[49]  M. Ruíz-Ferrer,et al.  An Impairment of Long Distance SOX10 Regulatory Elements Underlies Isolated Hirschsprung Disease , 2014, Human mutation.

[50]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[51]  Anna Murray,et al.  Recessive mutations in a distal PTF1A enhancer cause isolated pancreatic agenesis , 2013, Nature Genetics.

[52]  Alex P. Reiner,et al.  Loss-of-Function Mutations in APOC 3 , Triglycerides , and Coronary Disease , 2014 .

[53]  Gabor T. Marth,et al.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics , 2013, Science.

[54]  Benyan Wu,et al.  S100P Predicts Prognosis and Drug Resistance in Gastric Cancer , 2013, The International journal of biological markers.

[55]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[56]  Mark Gerstein,et al.  Interpretation of Genomic Variants Using a Unified Biological Network Approach , 2013, PLoS Comput. Biol..

[57]  Buhm Han,et al.  Chromatin marks identify critical cell types for fine mapping complex trait variants , 2012 .

[58]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[59]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[60]  Swneke D. Bailey,et al.  Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression , 2012, Nature Genetics.

[61]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[62]  G. Cerqueira,et al.  Plasmodium falciparum merozoite surface protein 1 blocks the proinflammatory protein S100P , 2012, Proceedings of the National Academy of Sciences.

[63]  D. Srivastava,et al.  Genetics of Human Cardiovascular Disease , 2012, Cell.

[64]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[65]  Kevin Y. Yip,et al.  Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors , 2012, Genome Biology.

[66]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[67]  L. MacNeil,et al.  Gene regulatory networks and the role of robustness and stochasticity in the control of gene expression. , 2011, Genome research.

[68]  C. Le Caignec,et al.  Identification of two novel mutations in Shh long‐range regulator associated with familial pre‐axial polydactyly , 2011, Clinical genetics.

[69]  T. Glaser,et al.  Deletion of a remote enhancer near ATOH7 disrupts retinal neurogenesis, causing NCRNA disease , 2011, Nature Neuroscience.

[70]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[71]  M. Nakafuku,et al.  Homeobox genes Gsx1 and Gsx2 differentially regulate telencephalic progenitor maturation , 2011, Proceedings of the National Academy of Sciences.

[72]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[73]  Stephen C. J. Parker,et al.  Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci. , 2010, Cell metabolism.

[74]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[75]  Jacques P. Bothma,et al.  Shadow Enhancers Foster Robustness of Drosophila Gastrulation , 2010, Current Biology.

[76]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[77]  Elizabeth T. Cirulli,et al.  The Characterization of Twenty Sequenced Human Genomes , 2010, PLoS genetics.

[78]  Illés J. Farkas,et al.  Uniformly curated signaling pathways reveal tissue-specific cross-talks and support drug target discovery , 2010, Bioinform..

[79]  G. K. Davis,et al.  Phenotypic robustness conferred by apparently redundant transcriptional enhancers , 2010, Nature.

[80]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[81]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[82]  J. Qian,et al.  Understanding protein phosphorylation on a systems level. , 2010, Briefings in functional genomics.

[83]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[84]  M. McCarthy,et al.  Genome-wide association studies: potential next steps on a genetic journey. , 2008, Human molecular genetics.

[85]  M. Levine,et al.  Shadow Enhancers as a Source of Evolutionary Novelty , 2008, Science.

[86]  Timothy B. Stockwell,et al.  Genetic Variation in an Individual Human Exome , 2008, PLoS genetics.

[87]  A. Visel,et al.  Ultraconservation identifies a small subset of extremely constrained developmental enhancers , 2008, Nature Genetics.

[88]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[89]  W. Reardon,et al.  Deletions at the SOX10 gene locus cause Waardenburg syndrome types 2 and 4. , 2007, American journal of human genetics.

[90]  K. Zaret,et al.  Repression by Groucho/TLE/Grg proteins: genomic site recruitment generates compacted chromatin in vitro and impairs activator binding in vivo. , 2007, Molecular cell.

[91]  Axel Visel,et al.  Deletion of Ultraconserved Elements Yields Viable Mice , 2007, PLoS biology.

[92]  David Haussler,et al.  Human Genome Ultraconserved Elements Are Ultraselected , 2007, Science.

[93]  S. Rosengren,et al.  Transcriptional control of SLC26A4 is involved in Pendred syndrome and nonsyndromic enlargement of vestibular aqueduct (DFNB4). , 2007, American journal of human genetics.

[94]  Inna Dubchak,et al.  VISTA Enhancer Browser—a database of tissue-specific human enhancers , 2006, Nucleic Acids Res..

[95]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[96]  R. Mariani-Costantini,et al.  A complex pattern of mutations and abnormal splicing of Smad4 is present in thyroid tumours , 2005, Oncogene.

[97]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[98]  M. Hosoya,et al.  Elimination of a long-range cis-regulatory module causes complete loss of limb-specific Shh expression and truncation of the mouse limb , 2005, Development.

[99]  Renji Reghunathan,et al.  Expression profile of immune response genes in patients with Severe Acute Respiratory Syndrome , 2005, BMC Immunology.

[100]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[101]  J. Belmont,et al.  Identification and Functional Analysis of ZIC3 Mutations in Heterotaxy and Related Congenital Heart Defects , 2022 .

[102]  D. Harada,et al.  Distribution and frequencies of PDS (SLC26A4) mutations in Pendred syndrome and nonsyndromic hearing loss associated with enlarged vestibular aqueduct: a unique spectrum of mutations in Japanese , 2003, European Journal of Human Genetics.

[103]  W. Pavan,et al.  The importance of having your SOX on: role of SOX10† in the development of neural crest-derived melanocytes and glia , 2003, Oncogene.

[104]  Shun-yuan Jiang,et al.  Retinoic acid increases expression of the calcium-binding protein S100P in human gastric cancer cells. , 2003, Journal of biomedical science.

[105]  E. Olson,et al.  Targeted deletion of a branchial arch-specific enhancer reveals a role of dHAND in craniofacial development , 2003, Development.

[106]  J. Belmont,et al.  A complex syndrome of left-right axis, central nervous system and axial skeleton defects in Zic3 mutant mice. , 2002, Development.

[107]  V. Sheffield,et al.  Pendred syndrome, DFNB4, and PDS/SLC26A4 identification of eight novel mutations and possible genotype–phenotype correlations , 2001, Human mutation.

[108]  D. Schlessinger,et al.  X-linked situs abnormalities result from mutations in ZIC3 , 1997, Nature Genetics.

[109]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .