A Grid Search-Based Multilayer Dynamic Ensemble System to Identify DNA N4—Methylcytosine Using Deep Learning Approach

DNA (Deoxyribonucleic Acid) N4-methylcytosine (4mC), a kind of epigenetic modification of DNA, is important for modifying gene functions, such as protein interactions, conformation, and stability in DNA, as well as for the control of gene expression throughout cell development and genomic imprinting. This simply plays a crucial role in the restriction–modification system. To further understand the function and regulation mechanism of 4mC, it is essential to precisely locate the 4mC site and detect its chromosomal distribution. This research aims to design an efficient and high-throughput discriminative intelligent computational system using the natural language processing method “word2vec” and a multi-configured 1D convolution neural network (1D CNN) to predict 4mC sites. In this article, we propose a grid search-based multi-layer dynamic ensemble system (GS-MLDS) that can enhance existing knowledge of each level. Each layer uses a grid search-based weight searching approach to find the optimal accuracy while minimizing computation time and additional layers. We have used eight publicly available benchmark datasets collected from different sources to test the proposed model’s efficiency. Accuracy results in test operations were obtained as follows: 0.978, 0.954, 0.944, 0.961, 0.950, 0.973, 0.948, 0.952, 0.961, and 0.980. The proposed model has also been compared to 16 distinct models, indicating that it can accurately predict 4mC.

[1]  Yunyun Liang,et al.  Identification of DNA N4-methylcytosine sites based on multi-source features and gradient boosting decision tree. , 2022, Analytical biochemistry.

[2]  Jijun Tang,et al.  Identification of DNA N4-methylcytosine sites via fuzzy model on self representation , 2022, Appl. Soft Comput..

[3]  Lezheng Yu,et al.  Systematic Analysis and Accurate Identification of DNA N4-Methylcytosine Sites by Deep Learning , 2022, Frontiers in Microbiology.

[4]  Hao Lin,et al.  Deep-4mCGP: A Deep Learning Approach to Predict 4mC Sites in Geobacter pickeringii by Using Correlation-Based Feature Selection Technique , 2022, International journal of molecular sciences.

[5]  Leyi Wei,et al.  Mouse4mC-BGRU: deep learning for predicting DNA N4-methylcytosine sites in mouse genome. , 2022, Methods.

[6]  Hilal Tayara,et al.  DCNN-4mC: Densely connected neural network based N4-methylcytosine site prediction in multiple species , 2021, Computational and structural biotechnology journal.

[7]  Kil To Chong,et al.  Intelligent and robust computational prediction model for DNA N4-methylcytosine sites via natural language processing , 2021 .

[8]  Hao Lin,et al.  Deep-4mCW2V: A sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli. , 2021, Methods.

[9]  Yanjuan Li,et al.  i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning , 2021, BioMed research international.

[10]  Kil To Chong,et al.  iRG-4mC: Neural Network Based Tool for Identification of DNA 4mC Sites in Rosaceae Genome , 2021, Symmetry.

[11]  Mohammed Nasir Uddin,et al.  An ensemble method based multilayer dynamic system to predict cardiovascular disease using machine learning approach , 2021 .

[12]  Q. Zou,et al.  Identifying DNA N4-methylcytosine sites in the rosaceae genome with a deep learning model relying on distributed feature representation , 2021, Computational and structural biotechnology journal.

[13]  Z. Xuan,et al.  DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine , 2021, Scientific Reports.

[14]  Fu-Ying Dao,et al.  A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae , 2019, Briefings Bioinform..

[15]  Balachandran Manavalan,et al.  Empirical Comparison and Analysis of Web-Based DNA N4-Methylcytosine Site Prediction Tools , 2020, Molecular therapy. Nucleic acids.

[16]  Jinyan Li,et al.  Accurate prediction of DNA N4-methylcytosine sites via boost-learning various types of sequence features , 2020, BMC Genomics.

[17]  Abdul Wahab,et al.  DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes By Using Deep Learning , 2020, Cells.

[18]  Hiroyuki Kurata,et al.  i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes , 2020, Computational and structural biotechnology journal.

[19]  Balachandran Manavalan,et al.  i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome. , 2019, International journal of biological macromolecules.

[20]  Kil To Chong,et al.  4mCCNN: Identification of N4-Methylcytosine Sites in Prokaryotes Using Convolutional Neural Network , 2019, IEEE Access.

[21]  Leyi Wei,et al.  Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation , 2019, Molecular therapy. Nucleic acids.

[22]  Jun Li,et al.  A Method of Feature Selection Based on Word2Vec in Text Categorization , 2018, 2018 37th Chinese Control Conference (CCC).

[23]  Q. Cui,et al.  SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features , 2016, Nucleic acids research.

[24]  V. Kandi,et al.  Effect of DNA Methylation in Various Diseases and the Probable Protective Role of Nutrition: A Mini-Review , 2015, Cureus.

[25]  Retno Kusumaningrum,et al.  Sentiment Analysis Using Word2vec And Long Short-Term Memory (LSTM) For Indonesian Hotel Reviews , 2021 .

[26]  Guanyun Fang,et al.  Word2vec based deep learning network for DNA N4-methylcytosine sites identification , 2021 .