Cross-species analysis of enhancer logic using deep learning

Deciphering the genomic regulatory code of enhancers is a key challenge in biology because this code underlies cellular identity. A better understanding of how enhancers work will improve the interpretation of noncoding genome variation and empower the generation of cell type–specific drivers for gene therapy. Here, we explore the combination of deep learning and cross-species chromatin accessibility profiling to build explainable enhancer models. We apply this strategy to decipher the enhancer code in melanoma, a relevant case study owing to the presence of distinct melanoma cell states. We trained and validated a deep learning model, called DeepMEL, using chromatin accessibility data of 26 melanoma samples across six different species. We show the accuracy of DeepMEL predictions on the CAGI5 challenge, where it significantly outperforms existing models on the melanoma enhancer of IRF4. Next, we exploit DeepMEL to analyze enhancer architectures and identify accurate transcription factor binding sites for the core regulatory complexes in the two different melanoma states, with distinct roles for each transcription factor, in terms of nucleosome displacement or enhancer activation. Finally, DeepMEL identifies orthologous enhancers across distantly related species, where sequence alignment fails, and the model highlights specific nucleotide substitutions that underlie enhancer turnover. DeepMEL can be used from the Kipoi database to predict and optimize candidate enhancers and to prioritize enhancer mutations. In addition, our computational strategy can be applied to other cancer or normal cell types.

[1]  J. R. Fresco,et al.  Nucleotide Sequence , 2020, Definitions.

[2]  R. Tjian,et al.  The promoter-specific transcription factor Sp1 binds to upstream sequences in the SV40 early promoter , 1983, Cell.

[3]  B. Crombrugghe,et al.  Role of the CCAAT-binding protein CBF/NF-Y in transcription. , 1998, Trends in biochemical sciences.

[4]  Kelvin H. Lee,et al.  Genomic analysis. , 2000, Current opinion in biotechnology.

[5]  T. Graves,et al.  Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. , 2001, Genome research.

[6]  E. V. van Donselaar,et al.  The Melanocytic Protein Melan‐A/MART‐1 Has a Subcellular Localization Distinct from Typical Melanosomal Proteins , 2002, Traffic.

[7]  A. Clark,et al.  Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. , 2002, Molecular biology and evolution.

[8]  Martin C. Frith,et al.  Cluster-Buster: finding dense clusters of motifs in DNA sequences , 2003, Nucleic Acids Res..

[9]  G. Egidy,et al.  Establishment and characterization of a normal melanocyte cell line derived from pig skin. , 2003, Pigment cell research.

[10]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[11]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[12]  Lothar Reichel,et al.  Augmented Implicitly Restarted Lanczos Bidiagonalization Methods , 2005, SIAM J. Sci. Comput..

[13]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[14]  D. Schadendorf,et al.  Metastatic potential of melanomas defined by specific gene expression profiles with no BRAF signature. , 2006, Pigment cell research.

[15]  E. Ukkonen,et al.  Genome-wide Prediction of Mammalian Enhancers Based on Analysis of Transcription-Factor Binding Affinity , 2006, Cell.

[16]  Colin N. Dewey,et al.  Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures , 2007, Nature.

[17]  X. Sastre-Garau,et al.  Transcription analysis in the MeLiM swine model identifies RACK1 as a potential marker of malignancy for human melanocytic proliferation , 2008, Molecular Cancer.

[18]  M. Grabherr,et al.  A cis-acting regulatory mutation causes premature hair graying and susceptibility to melanoma in the horse , 2008, Nature Genetics.

[19]  S. Barolo,et al.  Reverse-engineering a transcriptional enhancer: a case study in Drosophila. , 2008, Tissue Engineering. Part A.

[20]  F. Rambow,et al.  Identification of differentially expressed genes in spontaneously regressing melanoma using the MeLiM Swine Model , 2008, Pigment cell & melanoma research.

[21]  R. Dummer,et al.  In vivo switching of human melanoma cells between proliferative and invasive states. , 2008, Cancer research.

[22]  L. Zon,et al.  Transparent adult zebrafish as a tool for in vivo transplantation analysis. , 2008, Cell stem cell.

[23]  M. Levine,et al.  Shadow Enhancers as a Source of Evolutionary Novelty , 2008, Science.

[24]  Irene K. Moore,et al.  The DNA-encoded nucleosome organization of a eukaryotic genome , 2009, Nature.

[25]  R. DePinho,et al.  BRafV600E cooperates with Pten silencing to elicit metastatic melanoma , 2009, Nature Genetics.

[26]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[27]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[28]  Pavel Tomancak,et al.  An alignment-free method to identify candidate orthologous enhancers in multiple Drosophila genomes , 2010, Bioinform..

[29]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[30]  Cory Y. McLean,et al.  GREAT improves functional interpretation of cis-regulatory regions , 2010, Nature Biotechnology.

[31]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[32]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[33]  G. Crawford,et al.  DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. , 2010, Cold Spring Harbor protocols.

[34]  R. Young,et al.  Histone H3K27ac separates active from poised enhancers and predicts developmental state , 2010, Proceedings of the National Academy of Sciences.

[35]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[36]  David A. Orlando,et al.  The histone methyltransferase SETDB1 is recurrently amplified in melanoma and accelerates its onset , 2011, Nature.

[37]  Albert J. Vilella,et al.  A high-resolution map of human evolutionary constraint using 29 mammals , 2011, Nature.

[38]  Helge G. Roider,et al.  Transcription factor binding predictions using TRAP for the analysis of ChIP-seq data and regulatory SNPs , 2011, Nature Protocols.

[39]  Charles Y. Lin,et al.  DHODH modulates transcriptional elongation in the neural crest and melanoma , 2011, Nature.

[40]  J. Carroll,et al.  Pioneer transcription factors: establishing competence for gene expression. , 2011, Genes & development.

[41]  L. Andersson,et al.  Identification of a melanocyte‐specific, microphthalmia‐associated transcription factor‐dependent regulatory element in the intronic duplication causing hair greying and melanoma in horses , 2012, Pigment cell & melanoma research.

[42]  S. Aerts,et al.  i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules , 2012, Nucleic acids research.

[43]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[44]  Hubing Shi,et al.  MDM4 is a key therapeutic target in cutaneous melanoma , 2012, Nature Medicine.

[45]  J. van Helden,et al.  RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets , 2011, Nucleic acids research.

[46]  D. Bernstein,et al.  SERPINE1 expression discriminates site‐specific metastasis in human melanoma , 2012, Experimental dermatology.

[47]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2011 , 2011, Nucleic Acids Res..

[48]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2013 , 2012, Nucleic Acids Res..

[49]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[50]  L. Andersson,et al.  Constitutive activation of the ERK pathway in melanoma and skin melanocytes in Grey horses , 2014, BMC Cancer.

[51]  L. Andersson,et al.  Establishment and characterization of a primary and a metastatic melanoma cell line from Grey horses , 2013, In Vitro Cellular & Developmental Biology - Animal.

[52]  David Haussler,et al.  The UCSC genome browser and associated tools , 2012, Briefings Bioinform..

[53]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[54]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[55]  Michael D. Wilson,et al.  Multi-species, multi-transcription factor binding highlights conserved control of tissue-specific biological pathways , 2014, eLife.

[56]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[57]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[58]  A. Stark,et al.  Transcriptional enhancers: from properties to genome-wide predictions , 2014, Nature Reviews Genetics.

[59]  Tatsunori B. Hashimoto,et al.  Discovery of non-directional and directional pioneer transcription factors by modeling DNase profile magnitude and shape , 2014, Nature Biotechnology.

[60]  C. Berking,et al.  SOX10 promotes melanoma cell invasion by regulating melanoma inhibitory activity. , 2014, The Journal of investigative dermatology.

[61]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[62]  Stein Aerts,et al.  iRegulon: From a Gene List to a Gene Regulatory Network Using Large Motif and Track Collections , 2014, PLoS Comput. Biol..

[63]  Stein Aerts,et al.  i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly , 2015, Nucleic Acids Res..

[64]  S. Aerts,et al.  Transcription factor MITF and remodeller BRG1 define chromatin organisation at regulatory elements in melanoma cells , 2015, eLife.

[65]  Manolis Kellis,et al.  Deep learning for regulatory genomics , 2015, Nature Biotechnology.

[66]  F. Gage,et al.  Enhancer Divergence and cis-Regulatory Evolution in the Human and Chimp Neural Crest , 2015, Cell.

[67]  S. Aerts,et al.  Decoding the regulatory landscape of melanoma reveals TEADS as regulators of the invasive cell state , 2015, Nature Communications.

[68]  J. T. Erichsen,et al.  Enhancer Evolution across 20 Mammalian Species , 2015, Cell.

[69]  A. McCallion,et al.  Genomic analysis reveals distinct mechanisms and functional classes of SOX10-regulated genes in melanocytes. , 2015, Human molecular genetics.

[70]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[71]  L. Zon,et al.  A Quantitative System for Studying Metastasis Using Transparent Zebrafish. , 2015, Cancer research.

[72]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[73]  B. Bastian,et al.  From melanocytes to melanomas , 2016, Nature Reviews Cancer.

[74]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[75]  K. Kaestner,et al.  The Pioneer Transcription Factor FoxA Maintains an Accessible Nucleosome Configuration at Enhancers for Tissue-Specific Gene Activation. , 2016, Molecular cell.

[76]  G. Wagner,et al.  The origin and evolution of cell types , 2016, Nature Reviews Genetics.

[77]  Fidel Ramírez,et al.  deepTools2: a next generation web server for deep-sequencing data analysis , 2016, Nucleic Acids Res..

[78]  J. Wysocka,et al.  Ever-Changing Landscapes: Transcriptional Enhancers in Development and Evolution , 2016, Cell.

[79]  Alicia N. Schep,et al.  Nfib Promotes Metastasis through a Widespread Increase in Chromatin Accessibility , 2016, Cell.

[80]  R. Young,et al.  A zebrafish melanoma model reveals emergence of neural crest identity during melanoma initiation , 2016, Science.

[81]  C. Ivan,et al.  NFAT1 Directly Regulates IL8 and MMP3 to Promote Melanoma Tumor Growth and Metastasis. , 2016, Cancer research.

[82]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[83]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[84]  R. White,et al.  Generation and analysis of zebrafish melanoma models. , 2016, Methods in cell biology.

[85]  D. Adams,et al.  Cross‐species models of human melanoma , 2015, The Journal of pathology.

[86]  Ning Chen,et al.  DeepEnhancer: Predicting enhancers by convolutional neural networks , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[87]  M. Askarian-Amiri,et al.  Signaling Pathways in Melanogenesis , 2016, International journal of molecular sciences.

[88]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[89]  D. Fisher,et al.  The master role of microphthalmia-associated transcription factor in melanocyte and melanoma biology. , 2017, Laboratory investigation; a journal of technical methods and pathology.

[90]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[91]  R. Jauch,et al.  Molecular basis for the genome engagement by Sox proteins. , 2017, Seminars in cell & developmental biology.

[92]  E. Bernstein,et al.  Harnessing BET Inhibitor Sensitivity Reveals AMIGO2 as a Melanoma Survival Gene. , 2017, Molecular cell.

[93]  O. Stegle,et al.  DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning , 2016, Genome Biology.

[94]  Nicholas A. Sinnott-Armstrong,et al.  An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues , 2017, Nature Methods.

[95]  William Stafford Noble,et al.  Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture , 2017, bioRxiv.

[96]  W. Pavan,et al.  TFAP2 paralogs regulate melanocyte differentiation in parallel with MITF , 2017, PLoS genetics.

[97]  L. Zon,et al.  From fish bowl to bedside: The power of zebrafish to unravel melanoma pathogenesis and discover new therapeutics , 2017, Pigment cell & melanoma research.

[98]  Alexandra E. Fish,et al.  Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties , 2018, PLoS Comput. Biol..

[99]  John M. Gaspar,et al.  Improved peak-calling with MACS2 , 2018, bioRxiv.

[100]  X. Thuru,et al.  Isolation and characterization of two canine melanoma cell lines: new models for comparative oncology , 2018, BMC Cancer.

[101]  Avanti Shrikumar,et al.  Technical Note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) version 0.4.2.2 , 2019 .

[102]  W. E,et al.  DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants , 2018, Nucleic acids research.

[103]  William Stafford Noble,et al.  Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture , 2017, bioRxiv.

[104]  Sharon R Grossman,et al.  Positional specificity of different transcription factor classes within enhancers , 2018, Proceedings of the National Academy of Sciences.

[105]  S. Aerts,et al.  The transcription factor Grainyhead primes epithelial enhancers for spatiotemporal activation by displacing nucleosomes , 2018, Nature Genetics.

[106]  S. Aerts,et al.  Prioritization of enhancer mutations by combining allele-specific chromatin accessibility with deep learning , 2019, bioRxiv.

[107]  Jun Cheng,et al.  The Kipoi repository accelerates community exchange and reuse of predictive models for genomics , 2019, Nature Biotechnology.

[108]  Avanti Shrikumar,et al.  Base-resolution models of transcription factor binding reveal soft motif syntax , 2019, Nature Genetics.

[109]  J. Marine,et al.  Melanoma plasticity and phenotypic diversity: therapeutic barriers and opportunities , 2019, Genes & development.

[110]  T. Nakagawa,et al.  Transcriptome analysis of dog oral melanoma and its oncogenic analogy with human melanoma , 2019, Oncology reports.

[111]  Sandy L. Klemm,et al.  Chromatin accessibility and the regulatory epigenome , 2019, Nature Reviews Genetics.

[112]  Simon C. Potter,et al.  The EMBL-EBI search and sequence analysis tools APIs in 2019 , 2019, Nucleic Acids Res..

[113]  Beth K. Martin,et al.  Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution , 2019, Nature Communications.

[114]  K. Lindblad-Toh,et al.  Genome-Wide Analysis of Long Non-Coding RNA Profiles in Canine Oral Melanomas , 2019, Genes.

[115]  C. André,et al.  Canine Melanomas as Models for Human Melanomas: Clinical, Histological, and Genetic Comparison , 2019, Genes.

[116]  Stein Aerts,et al.  cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data , 2019, Nature Methods.

[117]  Fabian J Theis,et al.  Deep learning: new computational modelling techniques for genomics , 2019, Nature Reviews Genetics.

[118]  S. Johnsen,et al.  Perturbing Enhancer Activity in Cancer Therapy , 2019, Cancers.

[119]  Jacob M. Schreiber,et al.  A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens , 2019, Cell.

[120]  S. Aerts,et al.  Robust gene expression programs underlie recurrent cell states and phenotype switching in melanoma , 2020, Nature Cell Biology.

[121]  Hugh Chen,et al.  From local explanations to global understanding with explainable AI for trees , 2020, Nature Machine Intelligence.

[122]  C. Dienemann,et al.  Nucleosome-bound SOX2 and SOX11 structures elucidate pioneer factor function , 2020, Nature.