RNA-MethylPred: A high-accuracy predictor to identify N6-methyladenosine in RNA.

N6-methyladenosine (m(6)A) is present ubiquitously in the RNA of living organisms from Escherichia coli to humans. Nonetheless, the exact molecular mechanism of this modification remains unclear. The experimental identification of m(6)A modification is time-consuming and expensive; therefore, bioinformatics tools with high accuracy represent desirable alternatives for the large-scale, rapid identification of N6-methyladenosine sites. In this study, RNA-MethylPred, a new bioinformatics model, was developed by incorporating bi-profile Bayes, dinucleotide composition, and k nearest neighbor (KNN) scores for three feature extractions. RNA-MethylPred yielded a Matthew's correlation coefficient (MCC) of 0.53 in a jackknife test, which was 0.24 higher than that of iRNA-Methyl and 0.13 higher than that of pRNAm-PC. The obvious improvements demonstrated that RNA-MethylPred might be a powerful and complementary tool for further experimental investigation of N6-methyladenosine modification.

[1]  Dong Xu,et al.  Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction , 2009, PloS one.

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  Zhi-ping Wang,et al.  O-GlcNAcPRED: a sensitive predictor to capture protein O-GlcNAcylation sites. , 2013, Molecular bioSystems.

[4]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[5]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[6]  Ling-Yun Wu,et al.  iSuc-PseAAC: predicting lysine succinylation in proteins by incorporating peptide position-specific propensity , 2015, Scientific Reports.

[7]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[8]  Chen Lin,et al.  LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy , 2014, Neurocomputing.

[9]  Dong Xu,et al.  A New Machine Learning Approach for Protein Phosphorylation Site Prediction in Plants , 2009, BICoB.

[10]  Shao-Ping Shi,et al.  SuccFind: a novel succinylation sites online prediction tool via enhanced characteristic strategy , 2015, Bioinform..

[11]  Geoffrey I. Webb,et al.  Cascleave: towards more accurate prediction of caspase substrate cleavage sites , 2010, Bioinform..

[12]  Cangzhi Jia,et al.  Prediction of mitochondrial proteins of malaria parasite using bi-profile Bayes feature extraction. , 2011, Biochimie.

[13]  Xing Gao,et al.  An Improved Protein Structural Classes Prediction Method by Incorporating Both Sequence and Structure Information , 2015, IEEE Transactions on NanoBioscience.

[14]  M. Kupiec,et al.  Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq , 2012, Nature.

[15]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[16]  K. Chou,et al.  pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties. , 2016, Analytical biochemistry.

[17]  B. Liu,et al.  Identification of Real MicroRNA Precursors with a Pseudo Structure Status Composition Approach , 2015, PloS one.

[18]  Yun-Gui Yang,et al.  N6-methyl-adenosine (m6A) in RNA: An Old Modification with A Novel Epigenetic Function , 2012, Genom. Proteom. Bioinform..

[19]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[20]  Schraga Schwartz,et al.  High-Resolution Mapping Reveals a Conserved, Widespread, Dynamic mRNA Methylation Program in Yeast Meiosis , 2013, Cell.

[21]  Bela Stantic,et al.  EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models. , 2016, Journal of molecular biology.

[22]  R. Ji,et al.  Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[24]  P. Sergiev,et al.  N6-Methylated Adenosine in RNA: From Bacteria to Humans. , 2016, Journal of molecular biology.

[25]  Qing Zhang,et al.  High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles , 2011, Bioinform..

[26]  Ying Ju,et al.  Improving tRNAscan‐SE Annotation Results via Ensemble Classifiers , 2015, Molecular informatics.

[27]  Maqsood Hayat,et al.  Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine , 2014, Comput. Methods Programs Biomed..

[28]  Xiangxiang Zeng,et al.  Identification of cytokine via an improved genetic algorithm , 2014, Frontiers of Computer Science.

[29]  K. Chou,et al.  Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms , 2010 .

[30]  Xiang Chen,et al.  Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites , 2013, Bioinform..

[31]  Radhika Das,et al.  Role of the N6-methyladenosine RNA mark in gene regulation and its implications on development and disease. , 2015, Briefings in functional genomics.

[32]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[33]  K. Chou,et al.  iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. , 2015, Analytical biochemistry.

[34]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[35]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).