Enhancer-DSNet: A Supervisedly Prepared Enriched Sequence Representation for the Identification of Enhancers and Their Strength

Identification of enhancers and their strength prediction plays an important role in gene expression regulation and currently an active area of research. However, its identification specifically through experimental approaches is extremely time consuming and labor-intensive task. Several machine learning methodologies have been proposed to accurately discriminate enhancers from regulatory elements and to estimate their strength. Existing approaches utilise different statistical measures for feature encoding which mainly capture residue specific physico-chemical properties upto certain extent but ignore semantic and positional information of residues. This paper presents “Enhancer-DSNet”, a two-layer precisely deep neural network which makes use of a novel k-mer based sequence representation scheme prepared by fusing associations between k-mer positions and sequence type. Proposed Enhancer-DSNet methodology is evaluated on a publicly available benchmark dataset and independent test set. Experimental results over benchmark independent test set indicate that proposed Enhancer-DSNet methodology outshines the performance of most recent predictor by the figure of 2%, 1%, 2%, and 5% in terms of accuracy, specificity, sensitivity and matthews correlation coefficient for enhancer identification task and by the figure of 15%, 21%, and 39% in terms of accuracy, specificity, and matthews correlation coefficient for strong/weak enhancer prediction task.

[1]  Yang Wang,et al.  A new method for enhancer prediction based on deep belief network , 2017, BMC Bioinformatics.

[2]  Fan Yang,et al.  iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC , 2018, Bioinform..

[3]  Matthew Chin Heng Chua,et al.  Ensemble of Deep Recurrent Neural Networks for Identifying Enhancers via Dinucleotide Physicochemical Properties , 2019, Cells.

[4]  Nathaniel D Heintzman,et al.  Finding distal regulatory elements in the human genome. , 2009, Current opinion in genetics & development.

[5]  Wei Xie,et al.  RFECS: A Random-Forest Based Algorithm for Enhancer Identification from Chromatin State , 2013, PLoS Comput. Biol..

[6]  Yu-Yen Ou,et al.  iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding. , 2019, Analytical biochemistry.

[7]  Cangzhi Jia,et al.  EnhancerPred2.0: predicting enhancers and their strength based on position-specific trinucleotide propensity and electron-ion interaction potential feature selection. , 2017, Molecular bioSystems.

[8]  E. Birney,et al.  High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. , 2011, Genome research.

[9]  Chao Ren,et al.  BiRen: predicting enhancers with a deep‐learning‐based model using the DNA sequence alone , 2017, Bioinform..

[10]  Hongfang Liu,et al.  A Comparison of Word Embeddings for the Biomedical Natural Language Processing , 2018, J. Biomed. Informatics.

[11]  De-Shuang Huang,et al.  iEnhancer‐EL: identifying enhancers and their strength with ensemble learning approach , 2018, Bioinform..

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Cangzhi Jia,et al.  EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features , 2016, Scientific Reports.

[14]  Kai Tan,et al.  Discover regulatory DNA elements using chromatin signatures and artificial neural network , 2010, Bioinform..

[15]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[16]  Katherine S. Pollard,et al.  Integrating Diverse Datasets Improves Developmental Enhancer Prediction , 2013, PLoS Comput. Biol..

[17]  Timothy J. Durham,et al.  Systematic analysis of chromatin state dynamics in nine human cell types , 2011, Nature.

[18]  Ahmad Zaki Shukor,et al.  Pre-Contact Sensor Based Collision Avoidance Manipulator , 2017 .

[19]  Rodrigo G. Arzate-Mejía,et al.  Enhancer RNAs: Insights Into Their Biological Role , 2019, Epigenetics insights.

[20]  K. Chou,et al.  Prediction of linear B-cell epitopes using amino acid pair antigenicity scale , 2007, Amino Acids.

[21]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.