DeepEnhancer: Predicting enhancers by convolutional neural networks

Enhancers are crucial to the understanding of mechanisms underlying gene transcriptional regulation. Although having been successfully applied in such projects as ENCODE and Roadmap to generate landscape of enhancers in human cell lines, high-throughput biological experimental techniques are still costly and time consuming for even larger scale identification of enhancers across a variety of tissues under different disease status, making computational identification of enhancers indispensable. In this paper, we propose a computational framework, named DeepEnhancer, to classify enhancers from background genomic sequences. We construct convolutional neural networks of various architectures and compare the classification performance with traditional sequence-based classifiers. We first train the deep learning model on the FANTOM5 permissive enhancer dataset, and then fine-tune the model on ENCODE cell type-specific enhancer datasets by adopting the transfer learning strategy. Experimental results demonstrate that DeepEnhancer has superior efficiency and effectiveness in classification tasks, and the use of max-pooling and batch normalization is beneficial to higher accuracy. To make our approach more understandable, we propose a strategy to visualize the convolutional kernels as sequence logos and compare them against the JASPAR database using TOMTOM. In summary, DeepEnhancer allows researchers to train highly accurate deep models and will be broadly applicable in computational biology.

[1]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[2]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[3]  Manolis Kellis,et al.  The NF-κB genomic landscape in lymphoblastoid B cells. , 2014, Cell reports.

[4]  Morteza Mohammad Noori,et al.  Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features , 2014, PLoS Comput. Biol..

[5]  Yanjun Qi,et al.  Deep Motif: Visualizing Genomic Sequence Classifications , 2016, ArXiv.

[6]  A. Visel,et al.  Large-Scale Discovery of Enhancers from Human Heart Tissue , 2011, Nature Genetics.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[10]  G. Bejerano,et al.  Enhancers: five essential questions , 2013, Nature Reviews Genetics.

[11]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[12]  Cesare Furlanello,et al.  A promoter-level mammalian expression atlas , 2015 .

[13]  J. T. Kadonaga,et al.  Going the distance: a current view of enhancer action. , 1998, Science.

[14]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[15]  Shane C. Dillon,et al.  The landscape of histone modifications across 1% of the human genome in five human cell lines. , 2007, Genome research.

[16]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[17]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[18]  Michael R. Green,et al.  Transcriptional regulatory elements in the human genome. , 2006, Annual review of genomics and human genetics.

[19]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[20]  Michael A. Beer,et al.  Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[21]  Daniel Quang,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015 .

[22]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[24]  Nathaniel D Heintzman,et al.  Finding distal regulatory elements in the human genome. , 2009, Current opinion in genetics & development.

[25]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[26]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[27]  E. Birney,et al.  High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. , 2011, Genome research.

[28]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[29]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  David R. Kelley,et al.  Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015 .