Predicting enhancers with deep convolutional neural networks

BackgroundWith the rapid development of deep sequencing techniques in the recent years, enhancers have been systematically identified in such projects as FANTOM and ENCODE, forming genome-wide landscapes in a series of human cell lines. Nevertheless, experimental approaches are still costly and time consuming for large scale identification of enhancers across a variety of tissues under different disease status, making computational identification of enhancers indispensable.ResultsTo facilitate the identification of enhancers, we propose a computational framework, named DeepEnhancer, to distinguish enhancers from background genomic sequences. Our method purely relies on DNA sequences to predict enhancers in an end-to-end manner by using a deep convolutional neural network (CNN). We train our deep learning model on permissive enhancers and then adopt a transfer learning strategy to fine-tune the model on enhancers specific to a cell line. Results demonstrate the effectiveness and efficiency of our method in the classification of enhancers against random sequences, exhibiting advantages of deep learning over traditional sequence-based classifiers. We then construct a variety of neural networks with different architectures and show the usefulness of such techniques as max-pooling and batch normalization in our method. To gain the interpretability of our approach, we further visualize convolutional kernels as sequence logos and successfully identify similar motifs in the JASPAR database.ConclusionsDeepEnhancer enables the identification of novel enhancers using only DNA sequences via a highly accurate deep learning model. The proposed computational framework can also be applied to similar problems, thereby prompting the use of machine learning methods in life sciences.

[1]  Michael A. Beer,et al.  Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[2]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[3]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[4]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[5]  Cesare Furlanello,et al.  A promoter-level mammalian expression atlas , 2015 .

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[8]  Morteza Mohammad Noori,et al.  Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features , 2014, PLoS Comput. Biol..

[9]  Yanjun Qi,et al.  Deep Motif: Visualizing Genomic Sequence Classifications , 2016, ArXiv.

[10]  Ya Guo,et al.  Efficient inversions and duplications of mammalian regulatory DNA elements and gene clusters by CRISPR/Cas9 , 2015, Journal of molecular cell biology.

[11]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[14]  Ning Chen,et al.  Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding , 2017, Bioinform..

[15]  Michael R. Green,et al.  Transcriptional regulatory elements in the human genome. , 2006, Annual review of genomics and human genetics.

[16]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[17]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[18]  Christoph Lahtz,et al.  Epigenetic changes of DNA repair genes in cancer. , 2011, Journal of molecular cell biology.

[19]  E. Birney,et al.  High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. , 2011, Genome research.

[20]  G. Bejerano,et al.  Enhancers: five essential questions , 2013, Nature Reviews Genetics.

[21]  Manolis Kellis,et al.  The NF-κB genomic landscape in lymphoblastoid B cells. , 2014, Cell reports.

[22]  Shane C. Dillon,et al.  The landscape of histone modifications across 1% of the human genome in five human cell lines. , 2007, Genome research.

[23]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[24]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[25]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[26]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[27]  A. Visel,et al.  Large-Scale Discovery of Enhancers from Human Heart Tissue , 2011, Nature Genetics.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[30]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[31]  J. T. Kadonaga,et al.  Going the distance: a current view of enhancer action. , 1998, Science.

[32]  A. Besaratinia,et al.  Epigenetics of human melanoma: promises and challenges. , 2014, Journal of molecular cell biology.

[33]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[34]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[35]  Nathaniel D Heintzman,et al.  Finding distal regulatory elements in the human genome. , 2009, Current opinion in genetics & development.

[36]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.