6mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion Mechanism

DNA methylation is one of the most extensive epigenetic modifications. DNA N6-methyladenine (6mA) plays a key role in many biology regulation processes. An accurate and reliable genome-wide identification of 6mA sites is crucial for systematically understanding its biological functions. Some machine learning tools can identify 6mA sites, but their limited prediction accuracy and lack of robustness limit their usability in epigenetic studies, which implies the great need of developing new computational methods for this problem. In this paper, we developed a novel computational predictor, namely the 6mAPred-MSFF, which is a deep learning framework based on a multi-scale feature fusion mechanism to identify 6mA sites across different species. In the predictor, we integrate the inverted residual block and multi-scale attention mechanism to build lightweight and deep neural networks. As compared to existing predictors using traditional machine learning, our deep learning framework needs no prior knowledge of 6mA or manually crafted sequence features and sufficiently capture better characteristics of 6mA sites. By benchmarking comparison, our deep learning method outperforms the state-of-the-art methods on the 5-fold cross-validation test on the seven datasets of six species, demonstrating that the proposed 6mAPred-MSFF is more effective and generic. Specifically, our proposed 6mAPred-MSFF gives the sensitivity and specificity of the 5-fold cross-validation on the 6mA-rice-Lv dataset as 97.88% and 94.64%, respectively. Our model trained with the rice data predicts well the 6mA sites of other five species: Arabidopsis thaliana, Fragaria vesca, Rosa chinensis, Homo sapiens, and Drosophila melanogaster with a prediction accuracy 98.51%, 93.02%, and 91.53%, respectively. Moreover, via experimental comparison, we explored performance impact by training and testing our proposed model under different encoding schemes and feature descriptors.

[1]  Xiangxiang Zeng,et al.  StackCPPred: a stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency , 2020, Bioinform..

[2]  Wei Chen,et al.  i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome , 2019, Bioinform..

[3]  Wei Chen,et al.  iDNA4mC: identifying DNA N4‐methylcytosine sites based on nucleotide chemical properties , 2017, Bioinform..

[4]  Minghui He,et al.  N6-Methyladenine DNA Modification in the Human Genome. , 2018, Molecular cell.

[5]  R. Ji,et al.  Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Xiangrong Liu,et al.  deepDR: a network-based deep learning approach to in silico drug repositioning , 2019, Bioinform..

[7]  Kristina M Smith,et al.  Genome-wide high throughput analysis of DNA methylation in eukaryotes. , 2009, Methods.

[8]  George Karypis,et al.  Repurpose Open Data to Discover Therapeutics for COVID-19 Using Deep Learning , 2020, Journal of proteome research.

[9]  Wei Chen,et al.  Iterative feature representation algorithm to improve the predictive performance of N7-methylguanosine sites , 2020, Briefings Bioinform..

[10]  Waqar Hussain,et al.  NPalmitoylDeep-PseAAC: A Predictor of N-Palmitoylation Sites in Proteins Using Deep Representations of Proteins and PseAAC via Modified 5-Steps Rule , 2020, Current Bioinformatics.

[11]  Leyi Wei,et al.  Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation , 2019, Molecular therapy. Nucleic acids.

[12]  Tuan D. Pham,et al.  DUNet: A deformable network for retinal vessel segmentation , 2018, Knowl. Based Syst..

[13]  Gwang Lee,et al.  SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome , 2019, Molecular therapy. Nucleic acids.

[14]  Hongkun Zheng,et al.  N6-Methyladenine DNA Methylation in Japonica and Indica Rice Genomes and Its Association with Gene Expression, Plant Development, and Stress Responses. , 2018, Molecular plant.

[15]  Shunmin He,et al.  N6-Methyladenine DNA Modification in Drosophila , 2015, Cell.

[16]  Yanjuan Li,et al.  A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD , 2020, Comput. Math. Methods Medicine.

[17]  Chuan He,et al.  Abundant DNA 6mA methylation during early embryogenesis of zebrafish and pig , 2016, Nature Communications.

[18]  Leyi Wei,et al.  mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation , 2018, Bioinform..

[19]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[20]  Vladimir N. Uversky,et al.  Classification of Chromosomal DNA Sequence Using A Hybrid Deep Learning Architecture , 2020 .

[21]  Zhi Xie,et al.  MethSMRT: an integrative database for DNA N6-methyladenine and N4-methylcytosine generated by single-molecular real-time sequencing , 2016, Nucleic Acids Res..

[22]  Shan Huang,et al.  ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles , 2020, BMC Bioinformatics.

[23]  Xiangxiang Zeng,et al.  MUFFIN: multi-scale feature fusion for drug-drug interaction prediction , 2021, Bioinform..

[24]  Fei Guo,et al.  Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier , 2017, Artif. Intell. Medicine.

[25]  Jiangning Song,et al.  Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms , 2018, Briefings Bioinform..

[26]  Yuming Zhao,et al.  Identifying Antioxidant Proteins by Using Amino Acid Composition and Protein-Protein Interactions , 2020, Frontiers in Cell and Developmental Biology.

[27]  Gaotao Shi,et al.  Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Pingping Wang,et al.  Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction , 2020, Frontiers in Bioengineering and Biotechnology.

[29]  Xiaofeng Song,et al.  The Expression Profiles of lncRNAs and Their Regulatory Network During Smek1/2 Knockout Mouse Neural Stem Cells Differentiation , 2020 .

[30]  Jiangning Song,et al.  ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides , 2018, Bioinform..

[31]  Jijun Tang,et al.  Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information , 2017, Inf. Sci..

[32]  Jijun Tang,et al.  Prediction of human protein subcellular localization using deep learning , 2017, J. Parallel Distributed Comput..

[33]  François Berger,et al.  N6-methyladenine: the other methylated base of DNA. , 2006, BioEssays : news and reviews in molecular, cellular and developmental biology.

[34]  Leyi Wei,et al.  A novel hierarchical selective ensemble classifier with bioinformatics application , 2017, Artif. Intell. Medicine.

[35]  Min Zhu,et al.  Review of the Applications of Deep Learning in Bioinformatics , 2020 .

[36]  Geng Tian,et al.  A Machine Learning-based Diagnosis of Thyroid Cancer Using Thyroid Nodules Ultrasound Images , 2020, Current Bioinformatics.

[37]  Xiaofeng Liu,et al.  Developing a Multi-Dose Computational Model for Drug-Induced Hepatotoxicity Prediction Based on Toxicogenomics Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  A. Krais,et al.  Genomic N6‐methyladenine determination by MEKC with LIF , 2010, Electrophoresis.

[39]  Yu Zhao,et al.  Identification and analysis of adenine N6-methylation sites in the rice genome , 2018, Nature Plants.

[40]  Lizhen Cui,et al.  Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework , 2020, Briefings Bioinform..

[41]  Rajeev Srivastava,et al.  Protein Secondary Structure Prediction Using Character Bi-gram Embedding and Bi-LSTM , 2021, Current Bioinformatics.

[42]  Yuanyuan Zhang,et al.  Genome-wide Differential-based Analysis of the Relationship between DNA Methylation and Gene Expression in Cancer , 2019 .

[43]  Zhiming Dai,et al.  SNNRice6mA: A Deep Learning Method for Predicting DNA N6-Methyladenine Sites in Rice Genome , 2019, Front. Genet..

[44]  Fei Li,et al.  MM-6mAPred: identifying DNA N6-methyladenine sites based on Markov model , 2019, Bioinform..

[45]  Qiulian Wu,et al.  N 6 -methyladenine DNA Modification in Glioblastoma , 2018, Cell.

[46]  Xinyi Liu,et al.  MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy-defined energy , 2020, Briefings Bioinform..

[47]  Xinyi Liu,et al.  Meta-GDBP: a high-level stacked regression model to improve anticancer drug response prediction , 2020, Briefings Bioinform..

[48]  Xing Gao,et al.  Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites , 2019, Neurocomputing.

[49]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[50]  Fan Liang,et al.  DNA N6-Adenine Methylation in Arabidopsis thaliana. , 2018, Developmental cell.

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Yongchun Zuo,et al.  Analysis of the Epigenetic Signature of Cell Reprogramming by Computational DNA Methylation Profiles , 2020 .

[53]  Hui Yang,et al.  iDNA-MS: An Integrated Computational Tool for Detecting DNA Modification Sites in Multiple Genomes , 2020, iScience.

[54]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[55]  Xiangxiang Zeng,et al.  Predicting enhancer-promoter interactions by deep learning and matching heuristic , 2020, Briefings Bioinform..

[56]  Xiangrong Liu,et al.  Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism , 2019, Bioinform..

[57]  Guohua Wang,et al.  Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method , 2021, Frontiers in Plant Science.

[58]  Jie Hu,et al.  Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools , 2019, Briefings Bioinform..

[59]  Tyson A. Clark,et al.  Direct detection of DNA methylation during single-molecule, real-time sequencing , 2010, Nature Methods.

[60]  Ran Su,et al.  M6APred-EL: A Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning , 2018, Molecular therapy. Nucleic acids.

[61]  Ran Su,et al.  CPPred-FL: a sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning , 2018, Briefings Bioinform..

[62]  Xiangxiang Zeng,et al.  Minirmd: accurate and fast duplicate removal tool for short reads via multiple minimizers , 2020, Bioinform..

[63]  Xiangxiang Zeng,et al.  Target identification among known drugs by deep learning from heterogeneous networks , 2020, Chemical science.

[64]  Xiangxiang Zeng,et al.  Deep Collaborative Filtering for Prediction of Disease Genes , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[65]  Kanu Geete,et al.  Robust Transcription Factor Binding Site Prediction Using Deep Neural Networks , 2020 .

[66]  Peng Gao,et al.  Predicting Thermophilic Proteins by Machine Learning , 2020, Current Bioinformatics.

[67]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[68]  Xiangxiang Zeng,et al.  Application of deep learning methods in biological networks , 2020, Briefings Bioinform..

[69]  Hao Lin,et al.  iDNA6mA-Rice: A Computational Tool for Detecting N6-Methyladenine Sites in Rice , 2019, Front. Genet..

[70]  Q. Zou,et al.  Prediction of bio-sequence modifications and the associations with diseases. , 2020, Briefings in functional genomics.

[71]  Kil To Chong,et al.  iDNA6mA (5-step rule): Identification of DNA N6-methyladenine sites in the rice genome by intelligent computational model via Chou's 5-step rule , 2019, Chemometrics and Intelligent Laboratory Systems.

[72]  Chuan-Le Xiao,et al.  MDR: an integrative DNA N6-methyladenine and N4-methylcytosine modification database for Rosaceae , 2019, Horticulture Research.

[73]  Ran Su,et al.  Iterative feature representations improve N4-methylcytosine site prediction , 2019, Bioinform..

[74]  Yuting Liu,et al.  DNA N6-methyladenine demethylase ALKBH1 enhances osteogenic differentiation of human MSCs , 2016, Bone Research.

[75]  Xinyi Liu,et al.  Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. , 2019, Methods.

[76]  Peng Jin,et al.  DNA N6-methyladenine is dynamically regulated in the mouse brain following environmental stress , 2017, Nature Communications.

[77]  Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species , 2021, PLoS computational biology.

[78]  L. Aravind,et al.  DNA Methylation on N6-Adenine in C. elegans , 2015, Cell.