iMethyl-Deep: N6 Methyladenosine Identification of Yeast Genome with Automatic Feature Extraction Technique by Using Deep Learning Algorithm

One of the most common and well studied post-transcription modifications in RNAs is N6-methyladenosine (m6A) which has been involved with a wide range of biological processes. Over the past decades, N6-methyladenosine produced some positive consequences through the high-throughput laboratory techniques but still, these lab processes are time consuming and costly. Diverse computational methods have been proposed to identify m6A sites accurately. In this paper, we proposed a computational model named iMethyl-deep to identify m6A Saccharomyces Cerevisiae on two benchmark datasets M6A2614 and M6A6540 by using single nucleotide resolution to convert RNA sequence into a high quality feature representation. The iMethyl-deep obtained 89.19% and 87.44% of accuracy on M6A2614 and M6A6540 respectively which show that our proposed method outperforms the state-of-the-art predictors, at least 8.44%, 8.96%, 8.69% and 0.173 on M6A2614 and 15.47%, 28.52%, 25.54 and 0.5 on M6A6540 higher in terms of four metrics Sp, Sn, ACC and MCC respectively. Meanwhile, M6A6540 dataset never used to train a model.

[1]  Balachandran Manavalan,et al.  4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N4-Methylcytosine Sites in the Mouse Genome , 2019, Cells.

[2]  Hyongsuk Kim,et al.  SEEK: A Framework of Superpixel Learning with CNN Features for Unsupervised Segmentation , 2020 .

[3]  C. Mantzoros,et al.  Clinical and genetic predictors of weight gain in patients diagnosed with breast cancer , 2012, British Journal of Cancer.

[4]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[5]  Schraga Schwartz,et al.  High-Resolution Mapping Reveals a Conserved, Widespread, Dynamic mRNA Methylation Program in Yeast Meiosis , 2013, Cell.

[6]  Lan Yao,et al.  A Deep Neural Network for Identifying DNA N4-Methylcytosine Sites , 2020, Frontiers in Genetics.

[7]  Muhammad Tahir,et al.  iRNA-PseTNC: identification of RNA 5-methylcytosine sites using hybrid vector space of pseudo nucleotide composition , 2020, Frontiers of Computer Science.

[8]  Wei Tao,et al.  A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae. , 2019, Briefings in functional genomics.

[9]  Arne Klungland,et al.  ALKBH5 is a mammalian RNA demethylase that impacts RNA metabolism and mouse fertility. , 2013, Molecular cell.

[10]  K. Matsuo,et al.  Association between variations in the fat mass and obesity-associated gene and pancreatic cancer risk: a case–control study in Japan , 2013, BMC Cancer.

[11]  Syed Danish Ali,et al.  iIM-CNN: Intelligent Identifier of 6mA Sites on Different Species by Using Convolution Neural Network , 2019, IEEE Access.

[12]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[13]  K. Chou,et al.  pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties. , 2016, Analytical biochemistry.

[14]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[15]  Hui Ding,et al.  iRNA(m6A)-PseDNC: Identifying N6-methyladenosine sites using pseudo dinucleotide composition. , 2018, Analytical biochemistry.

[16]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[17]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[18]  O. Elemento,et al.  Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 3′ UTRs and near Stop Codons , 2012, Cell.

[19]  Ran Su,et al.  Identifying N6-methyladenosine sites using multi-interval nucleotide pair position specificity and support vector machine , 2017, Scientific Reports.

[20]  Lei Zhang,et al.  FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[21]  S. Tavazoie,et al.  N6-methyladenosine marks primary microRNAs for processing , 2015, Nature.

[22]  B. Pierce,et al.  Association study of type 2 diabetes genetic susceptibility variants and risk of pancreatic cancer: an analysis of PanScan-I data , 2011, Cancer Causes & Control.

[23]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[24]  Q. Cui,et al.  SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features , 2016, Nucleic acids research.

[25]  Erlan Ramanculov,et al.  Genetic profile and determinants of homocysteine levels in Kazakhstan patients with breast cancer. , 2013, Anticancer research.

[26]  Xing Gao,et al.  Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites , 2019, Neurocomputing.

[27]  T. Nilsen Internal mRNA Methylation Finally Finds Functions , 2014, Science.

[28]  Peter Kraft,et al.  Association of type 2 diabetes susceptibility variants with advanced prostate cancer risk in the Breast and Prostate Cancer Cohort Consortium. , 2012, American journal of epidemiology.

[29]  K. Chou,et al.  iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. , 2015, Analytical biochemistry.

[30]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[31]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[32]  Kil To Chong,et al.  iRNA-PseKNC(2methyl): Identify RNA 2'-O-methylation sites by convolution neural network and Chou's pseudo components. , 2019, Journal of theoretical biology.

[33]  Wei Chen,et al.  Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines , 2017, Scientific Reports.

[34]  Horst Zitzelsberger,et al.  Novel candidate genes of thyroid tumourigenesis identified in Trk-T1 transgenic mice. , 2012, Endocrine-related cancer.

[35]  Samie R. Jaffrey,et al.  The dynamic epitranscriptome: N6-methyladenosine and gene expression control , 2014, Nature Reviews Molecular Cell Biology.

[36]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[37]  K. Chou Using subsite coupling to predict signal peptides. , 2001, Protein engineering.

[38]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[39]  Pufeng Du,et al.  PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[40]  Wei Chen,et al.  iRNA-PseU: Identifying RNA pseudouridine sites , 2016, Molecular therapy. Nucleic acids.

[41]  K. Chou,et al.  Prediction of linear B-cell epitopes using amino acid pair antigenicity scale , 2007, Amino Acids.

[42]  R. Desrosiers,et al.  Identification of methylated nucleosides in messenger RNA from Novikoff hepatoma cells. , 1974, Proceedings of the National Academy of Sciences of the United States of America.

[43]  X. Wang,et al.  Wilms' tumor 1 as a novel target for immunotherapy of leukemia. , 2010, Transplantation proceedings.

[44]  Kil To Chong,et al.  iDNA6mA (5-step rule): Identification of DNA N6-methyladenine sites in the rice genome by intelligent computational model via Chou's 5-step rule , 2019, Chemometrics and Intelligent Laboratory Systems.

[45]  Wei Zheng,et al.  Evaluating Genome-Wide Association Study-Identified Breast Cancer Risk Variants in African-American Women , 2013, PloS one.

[46]  G. Keith,et al.  Mobilities of modified ribonucleotides on two-dimensional cellulose thin-layer chromatography. , 1995, Biochimie.

[47]  M. Kupiec,et al.  Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq , 2012, Nature.

[48]  K. Chou Prediction of signal peptides using scaled window , 2001, Peptides.

[49]  Zhiming Dai,et al.  SNNRice6mA: A Deep Learning Method for Predicting DNA N6-Methyladenine Sites in Rice Genome , 2019, Front. Genet..