Predicting Enhancer-Promoter Interaction from Genomic Sequence with Deep Neural Networks

In the human genome, distal enhancers are involved in regulating target genes through proximal promoters by forming enhancer-promoter interactions. However, although recently developed high-throughput experimental approaches have allowed us to recognize potential enhancer-promoter interactions genome-wide, it is still largely unknown whether there are sequence-level instructions encoded in our genome that help govern such interactions. Here we report a new computational method (named “SPEID”) using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given. Our results across six different cell types demonstrate that SPEID is effective in predicting enhancer-promoter interactions as compared to state-of-the-art methods that use non-sequence features from functional genomic signals. This work shows for the first time that sequence-based features alone can reliably predict enhancer-promoter interactions genome-wide, which provides important insights into the sequence determinants for long-range gene regulation.

[1]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[5]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[6]  Christian Steidl,et al.  Essential role of Jun family transcription factors in PU.1 knockdown–induced leukemic stem cells , 2006, Nature Genetics.

[7]  Brad T. Sherman,et al.  The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists , 2007, Genome Biology.

[8]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[9]  Y. Ruan,et al.  ChIP‐based methods for the identification of long‐range chromatin interactions , 2009, Journal of cellular biochemistry.

[10]  S. Orkin,et al.  Transcriptional silencing of {gamma}-globin by BCL11A involves long-range interactions and cooperation with SOX6. , 2010, Genes & development.

[11]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[12]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[13]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[14]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[15]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[16]  Raymond K. Auerbach,et al.  Extensive Promoter-Centered Chromatin Interactions Provide a Topological Basis for Transcription Regulation , 2012, Cell.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  A. Sivachenko,et al.  A Landscape of Driver Mutations in Melanoma , 2012, Cell.

[19]  J. Dekker,et al.  The long-range interaction landscape of gene promoters , 2012, Nature.

[20]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[21]  W. Sung,et al.  Chromatin connectivity maps reveal dynamic promoter–enhancer long-range associations , 2013, Nature.

[22]  Brian David Dynlacht,et al.  Foxk proteins repress the initiation of starvation-induced atrophy and autophagy programs , 2014, Nature Cell Biology.

[23]  Enrique Blanco,et al.  ENCODE (Encyclopedia of DNA Elements) , 2014 .

[24]  Frank Rosenbauer,et al.  Epigenetic control of hematopoiesis: the PU.1 chromatin connection , 2014, Biological chemistry.

[25]  Manolis Kellis,et al.  Deep learning for regulatory genomics , 2015, Nature Biotechnology.

[26]  Giacomo Cavalli,et al.  The Role of Chromosome Domains in Shaping the Functional Genome , 2015, Cell.

[27]  Eric S. Lander,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2015, Cell.

[28]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[29]  Christopher M. Vockley,et al.  Regulation of chromatin accessibility and Zic binding at enhancers in the developing cerebellum , 2015, Nature Neuroscience.

[30]  Jing Liang,et al.  Chromatin architecture reorganization during stem cell differentiation , 2015, Nature.

[31]  Wyeth W. Wasserman,et al.  Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters , 2015, RECOMB.

[32]  Dariusz M Plewczynski,et al.  CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription , 2015, Cell.

[33]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Michael Q. Zhang,et al.  CRISPR Inversion of CTCF Sites Alters Genome Topology and Enhancer/Promoter Function , 2015, Cell.

[36]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[37]  Alireza F. Siahpirani,et al.  A predictive modeling approach for cell line-specific long-range regulatory interactions , 2015, Nucleic acids research.

[38]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[39]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[40]  Wei Wang,et al.  Constructing 3D interaction maps from 1D epigenomes , 2016, Nature Communications.

[41]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[42]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[43]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[44]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[45]  K. Pollard,et al.  Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin , 2016, Nature Genetics.

[46]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[47]  Vladimir B. Bajic,et al.  HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models , 2015, Nucleic Acids Res..

[48]  Kevin Y. Yip,et al.  Reconstruction of enhancer–target networks in 935 samples of human primary cells, tissues and cell lines , 2017, Nature Genetics.

[49]  N. Jojic,et al.  Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences , 2017, bioRxiv.

[50]  Tao Jiang,et al.  TITER: predicting translation initiation sites by deep learning , 2017, bioRxiv.

[51]  Ruochi Zhang,et al.  Exploiting sequence-based features for predicting enhancer–promoter interactions , 2017, Bioinform..

[52]  Nebojsa Jojic,et al.  Deep learning of the regulatory grammar of yeast 5’ untranslated regions from 500,000 random sequences , 2017 .

[53]  Jianyang Zeng,et al.  TIDE: predicting translation initiation sites by deep learning , 2017, bioRxiv.

[54]  Ann Dean,et al.  LDB1-mediated enhancer looping can be established independent of mediator and cohesin , 2017, Nucleic acids research.

[55]  W. Wasserman,et al.  Genome-wide prediction of cis-regulatory regions using supervised deep learning methods , 2016, BMC Bioinformatics.

[56]  Arshdeep Sekhon,et al.  Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin , 2017, bioRxiv.