Detecting N6-methyladenosine sites from RNA transcriptomes using random forest

Abstract N6-methyladenosine (m6A) modifications are one the most frequently occurring RNA post transcriptional modifications. These modifications perform vital roles in different biological processes, including, localization and translation of proteins, X chromosome inactivation, cell stability, microRNA regulation, and reprogramming etc. Any abnormal change in m6A sites may lead to several abnormalities, including, cancer, brain-related disorders and many other life threatening diseases. Precise detection of m6A modifications is crucial for the diagnosis and treatment of these diseases. Existing methods suffer from the problem of inefficient detection of m6A sites, especially in yeast transcriptomes (due to varied structure) and inability of the computational techniques to capture the encoded information surrounding the m6A sites. In this work, we propose a novel method (called m6A-pred predictor) that utilizes a fusion of characteristics including, statistical, and chemical properties of the nucleotides, to precisely predict the presence of m6A sites in RNA sequences. The fusion of multiple types of features results in a high dimensional vector which is further optimized using an evolutionary algorithm. Finally, the random forest classifier is used to detect m6A sites by using the most discriminative features. The results, benchmarked on yeast transcriptomes, indicate that m6A-pred predictor outperforms all the previously reported predictors, notably, with an accuracy value of 78.58%, specificity value of 79.65% and Matthews correlation coefficient of 0.5717.

[1]  Lei Li,et al.  BERMP: a cross-species classifier for predicting m6A sites by integrating a deep learning algorithm and a random forest approach , 2018, International journal of biological sciences.

[2]  Maqsood Hayat,et al.  iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples , 2015, Molecular Genetics and Genomics.

[3]  Clarimar José Coelho,et al.  Feature Selection using Genetic Algorithm: An Analysis of the Bias-Property for One-Point Crossover , 2016, GECCO.

[4]  Nikola Bogunovic,et al.  A review of feature selection methods with applications , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[5]  Shahid Akbar,et al.  iMethyl-STTNC: Identification of N6-methyladenosine sites by extending the idea of SAAC into Chou's PseAAC to formulate RNA sequences. , 2018, Journal of theoretical biology.

[6]  Michiaki Hamada,et al.  DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning , 2018, BMC Bioinformatics.

[7]  F. Agakov,et al.  Application of high-dimensional feature selection: evaluation for genomic prediction in man , 2015, Scientific Reports.

[8]  Samie R. Jaffrey,et al.  The dynamic epitranscriptome: N6-methyladenosine and gene expression control , 2014, Nature Reviews Molecular Cell Biology.

[9]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[10]  Ferran Reverter,et al.  SVM-RFE: selection and visualization of the most relevant features through non-linear kernels , 2018, BMC Bioinformatics.

[11]  Jun Liu,et al.  Methylation modifications in eukaryotic messenger RNA. , 2014, Journal of genetics and genomics = Yi chuan xue bao.

[12]  Abas Md Said,et al.  Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics , 2014, TheScientificWorldJournal.

[13]  Ran Su,et al.  M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species , 2018, Front. Genet..

[14]  Ho-Jin Choi,et al.  DNA Encoding for Splice Site Prediction in Large DNA Sequence , 2013, DASFAA Workshops.

[15]  Abdollah Dehzangi,et al.  PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences , 2019, Bioinform..

[16]  Jian Huang,et al.  Penalized feature selection and classification in bioinformatics , 2008, Briefings Bioinform..

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Yu-Sheng Chen,et al.  Dynamic transcriptomic m6A decoration: writers, erasers, readers and functions in RNA metabolism , 2018, Cell Research.

[19]  Gideon Rechavi,et al.  The dynamic N1-methyladenosine methylome in eukaryotic messenger RNA , 2016, Nature.

[20]  K. Chou,et al.  pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties. , 2016, Analytical biochemistry.

[21]  Dominik Heider,et al.  Encodings and models for antimicrobial peptide classification for multi-resistant pathogens , 2019, BioData Mining.

[22]  Dominik Heider,et al.  EFS: an ensemble feature selection tool implemented as R-package and web-application , 2017, BioData Mining.

[23]  Juan Manuel Górriz,et al.  SPECT image classification using random forests , 2009 .

[24]  Jionglong Su,et al.  WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach , 2019, Nucleic acids research.

[25]  Kenji Leibnitz,et al.  Genetic algorithms for feature selection when classifying severe chronic disorders of consciousness , 2019, PloS one.

[26]  Wei Chen,et al.  iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences , 2016, Oncotarget.

[27]  Saima Jabeen,et al.  Identification of microRNA precursors using reduced and hybrid features. , 2017, Molecular bioSystems.

[28]  K. Chou,et al.  iSS-PseDNC: Identifying Splicing Sites Using Pseudo Dinucleotide Composition , 2014, BioMed research international.

[29]  Wei Chen,et al.  MethyRNA: a web server for identification of N6-methyladenosine sites , 2017, Journal of biomolecular structure & dynamics.

[30]  Geoffrey I. Webb,et al.  iLearn : an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data , 2019, Briefings Bioinform..

[31]  Jef Rozenski,et al.  The RNA modification database, RNAMDB: 2011 update , 2010, Nucleic Acids Res..

[32]  X. Chen,et al.  Random forests for genomic data analysis. , 2012, Genomics.

[33]  Q. Cui,et al.  SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features , 2016, Nucleic acids research.

[34]  Wei Chen,et al.  RAMPred: identifying the N1-methyladenosine sites in eukaryotic transcriptomes , 2016, Scientific Reports.

[35]  K. Chou,et al.  iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. , 2015, Analytical biochemistry.

[36]  Wei Chen,et al.  Identification and analysis of the N6-methyladenosine in the Saccharomyces cerevisiae transcriptome , 2015, Scientific Reports.

[37]  Wenzhong Liu,et al.  SICM6A: Identifying m6A Site across Species by Transposed GRU Network , 2019, bioRxiv.

[38]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[39]  Geoffrey I. Webb,et al.  Multistrategy ensemble learning: reducing error by combining ensemble learning techniques , 2004, IEEE Transactions on Knowledge and Data Engineering.

[40]  Zhike Lu,et al.  Unique Features of the m6A Methylome in Arabidopsis thaliana , 2014, Nature Communications.

[41]  Wei Chen,et al.  Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines , 2017, Scientific Reports.

[42]  Christopher E. Mason,et al.  Single-nucleotide resolution mapping of m6A and m6Am throughout the transcriptome , 2015, Nature Methods.

[43]  Ming Zhang,et al.  Improving N(6)-methyladenosine site prediction with heuristic selection of nucleotide physical-chemical properties. , 2016, Analytical biochemistry.

[44]  Schraga Schwartz,et al.  High-Resolution Mapping Reveals a Conserved, Widespread, Dynamic mRNA Methylation Program in Yeast Meiosis , 2013, Cell.

[45]  M. Kupiec,et al.  Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq , 2012, Nature.

[46]  O. Elemento,et al.  Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 3′ UTRs and near Stop Codons , 2012, Cell.

[47]  T. Nilsen Internal mRNA Methylation Finally Finds Functions , 2014, Science.