Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism

MOTIVATION Identification of enhancer-promoter interactions (EPIs) is of great significance to human development. However, experimental methods to identify EPIs cost too much in terms of time, manpower and money. Therefore, more and more research efforts are focused on developing computational methods to solve this problem. Unfortunately, most existing computational methods require a variety of genomic data, which are not always available, especially for a new cell line. Therefore, it limits the large-scale practical application of methods. As an alternative, computational methods using sequences only have great genome-scale application prospects. RESULTS In this article, we propose a new deep learning method, namely EPIVAN, that enables predicting long-range EPIs using only genomic sequences. To explore the key sequential characteristics, we first use pre-trained DNA vectors to encode enhancers and promoters; afterwards, we use one-dimensional convolution and gated recurrent unit to extract local and global features; lastly, attention mechanism is used to boost the contribution of key features, further improving the performance of EPIVAN. Benchmarking comparisons on six cell lines show that EPIVAN performs better than state-of-the-art predictors. Moreover, we build a general model, which has transfer ability and can be used to predict EPIs in various cell lines. AVAILABILITY AND IMPLEMENTATION The source code and data are available at: https://github.com/hzy95/EPIVAN.

[1]  Michael Q. Zhang,et al.  Genome-wide map of regulatory interactions in the human genome , 2014, Genome research.

[2]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[3]  J. T. Kadonaga,et al.  Enhancer-promoter specificity mediated by DPE or TATA core promoter motifs. , 2001, Genes & development.

[4]  G. Bejerano,et al.  Enhancers: five essential questions , 2013, Nature Reviews Genetics.

[5]  H. Bussemaker,et al.  In search of the determinants of enhancer-promoter interaction specificity. , 2014, Trends in cell biology.

[6]  K. Pollard,et al.  Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin , 2016, Nature Genetics.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  A. Sandelin,et al.  Metazoan promoters: emerging characteristics and insights into transcriptional regulation , 2012, Nature Reviews Genetics.

[9]  Wei Pan,et al.  A simple convolutional neural network for prediction of enhancer-promoter interactions with DNA sequence data , 2019, Bioinform..

[10]  Ruochi Zhang,et al.  Exploiting sequence-based features for predicting enhancer–promoter interactions , 2017, Bioinform..

[11]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[12]  Wendy A Bickmore,et al.  Enhancers: from developmental genetics to the genetics of common human disease. , 2011, Developmental cell.

[13]  Barnabás Póczos,et al.  Predicting enhancer-promoter interaction from genomic sequence with deep neural networks , 2016, bioRxiv.