Imbalance learning for the prediction of N6-Methylation sites in mRNAs

BackgroundN6-methyladenosine (m6A) is an important epigenetic modification which plays various roles in mRNA metabolism and embryogenesis directly related to human diseases. To identify m6A in a large scale, machine learning methods have been developed to make predictions on m6A sites. However, there are two main drawbacks of these methods. The first is the inadequate learning of the imbalanced m6A samples which are much less than the non-m6A samples, by their balanced learning approaches. Second, the features used by these methods are not outstanding to represent m6A sequence characteristics.ResultsWe propose to use cost-sensitive learning ideas to resolve the imbalance data issues in the human mRNA m6A prediction problem. This cost-sensitive approach applies to the entire imbalanced dataset, without random equal-size selection of negative samples, for an adequate learning. Along with site location and entropy features, top-ranked positions with the highest single nucleotide polymorphism specificity in the window sequences are taken as new features in our imbalance learning. On an independent dataset, our overall prediction performance is much superior to the existing predictors. Our method shows stronger robustness against the imbalance changes in the tests on 9 datasets whose imbalance ratios range from 1:1 to 9:1. Our method also outperforms the existing predictors on 1226 individual transcripts. It is found that the new types of features are indeed of high significance in the m6A prediction. The case studies on gene c-Jun and CBFB demonstrate the detailed prediction capacity to improve the prediction performance.ConclusionThe proposed cost-sensitive model and the new features are useful in human mRNA m6A prediction. Our method achieves better correctness and robustness than the existing predictors in independent test and case studies. The results suggest that imbalance learning is promising to improve the performance of m6A prediction.

[1]  B. Cullen,et al.  Addition of m6A to SV40 late mRNAs enhances viral structural gene expression and replication , 2018, PLoS pathogens.

[2]  K. Chou,et al.  pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties. , 2016, Analytical biochemistry.

[3]  Yizhen Wang,et al.  N6-Methyladenosine (m6A) Methylation in mRNA with A Dynamic and Reversible Epigenetic Modification , 2016, Molecular Biotechnology.

[4]  Hui Liu,et al.  MeT-DB: a database of transcriptome methylation in mammalian cells , 2014, Nucleic Acids Res..

[5]  Schraga Schwartz,et al.  High-Resolution Mapping Reveals a Conserved, Widespread, Dynamic mRNA Methylation Program in Yeast Meiosis , 2013, Cell.

[6]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[7]  C. Kimchi-Sarfaty,et al.  Understanding the contribution of synonymous mutations to human disease , 2011, Nature Reviews Genetics.

[8]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[9]  Yuri Motorin,et al.  RNA nucleotide methylation , 2011, Wiley interdisciplinary reviews. RNA.

[10]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[11]  Shang Gao,et al.  Dynamics of the human and viral m6A RNA methylomes during HIV-1 infection of T cells , 2016, Nature Microbiology.

[12]  Jie Wu,et al.  RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data , 2015, Nucleic Acids Res..

[13]  Xiangxiang Zeng,et al.  nDNA-prot: identification of DNA-binding proteins based on unbalanced classification , 2014, BMC Bioinformatics.

[14]  Hong-Bin Shen,et al.  Prediction of Protein–Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures , 2015, The Journal of Membrane Biology.

[15]  Wei Chen,et al.  Identification and analysis of the N6-methyladenosine in the Saccharomyces cerevisiae transcriptome , 2015, Scientific Reports.

[16]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[17]  Gideon Rechavi,et al.  Gene expression regulation mediated through reversible m6A RNA methylation , 2014, Nature Reviews Genetics.

[18]  K. Chou,et al.  iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. , 2015, Analytical biochemistry.

[19]  Wei Huang,et al.  Decreased N(6)-methyladenosine in peripheral blood RNA from diabetic patients is associated with FTO expression rather than ALKBH5. , 2015, The Journal of clinical endocrinology and metabolism.

[20]  Xin Deng,et al.  Widespread occurrence of N6-methyladenosine in bacterial mRNA , 2015, Nucleic acids research.

[21]  Chuan He,et al.  N 6 -methyladenosine Modulates Messenger RNA Translation Efficiency , 2015, Cell.

[22]  M. Ehrenberg,et al.  N6-methyladenosine in mRNA disrupts tRNA selection and translation elongation dynamics , 2016, Nature Structural &Molecular Biology.

[23]  Ran Su,et al.  Identifying N6-methyladenosine sites using multi-interval nucleotide pair position specificity and support vector machine , 2017, Scientific Reports.

[24]  M. Kupiec,et al.  Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq , 2012, Nature.

[25]  Jing-Yu Yang,et al.  Protein-protein interaction sites prediction by ensembling SVM and sample-weighted random forests , 2016, Neurocomputing.

[26]  Yang Wang,et al.  N6-methyladenosine modification destabilizes developmental regulators in embryonic stem cells , 2014, Nature Cell Biology.

[27]  Wei Chen,et al.  iRNA-PseU: Identifying RNA pseudouridine sites , 2016, Molecular therapy. Nucleic acids.

[28]  Ming Zhang,et al.  Improving N(6)-methyladenosine site prediction with heuristic selection of nucleotide physical-chemical properties. , 2016, Analytical biochemistry.

[29]  Christopher E. Mason,et al.  Single-nucleotide resolution mapping of m6A and m6Am throughout the transcriptome , 2015, Nature Methods.

[30]  Samir Adhikari,et al.  Mammalian WTAP is a regulatory subunit of the RNA N6-methyladenosine methyltransferase , 2014, Cell Research.

[31]  J. Bokar The biosynthesis and functional roles of methylated nucleosides in eukaryotic mRNA , 2005 .

[32]  Wei Chen,et al.  MethyRNA: a web server for identification of N6-methyladenosine sites , 2017, Journal of biomolecular structure & dynamics.

[33]  Arne Klungland,et al.  A majority of m6A residues are in the last exons, allowing the potential for 3′ UTR regulation , 2015, Genes & development.

[34]  O. Elemento,et al.  Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 3′ UTRs and near Stop Codons , 2012, Cell.

[35]  Quan Zou,et al.  HPSLPred: An Ensemble Multi‐Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source , 2017, Proteomics.

[36]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Renxiang Yan,et al.  RFAthM6A: a new tool for predicting m6A sites in Arabidopsis thaliana , 2018, Plant Molecular Biology.

[38]  Peng Nie,et al.  m6AVar: a database of functional variants involved in m6A modification , 2017, Nucleic Acids Res..

[39]  Ying Ju,et al.  Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy , 2016, BMC Systems Biology.

[40]  Chuanzhao Zhang,et al.  Hypoxia induces the breast cancer stem cell phenotype by HIF-dependent and ALKBH5-mediated m6A-demethylation of NANOG mRNA , 2016, Proceedings of the National Academy of Sciences.

[41]  G. Keith,et al.  Mobilities of modified ribonucleotides on two-dimensional cellulose thin-layer chromatography. , 1995, Biochimie.

[42]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[43]  R. Soto-Rifo,et al.  Emerging Roles of N6-Methyladenosine on HIV-1 RNA Metabolism and Viral Replication , 2018, Front. Microbiol..

[44]  Jing-Yu Yang,et al.  A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction , 2014, PloS one.

[45]  Xiaohong Zhu,et al.  Transcriptome-wide high-throughput deep m6A-seq reveals unique differential m6A methylation patterns between three organs in Arabidopsis thaliana , 2015, Genome Biology.

[46]  Chuan He,et al.  Dynamics of Human and Viral RNA Methylation during Zika Virus Infection. , 2016, Cell host & microbe.

[47]  Hui Ding,et al.  iRNA(m6A)-PseDNC: Identifying N6-methyladenosine sites using pseudo dinucleotide composition. , 2018, Analytical biochemistry.

[48]  Zhirong Sun,et al.  AthMethPre: a web server for the prediction and query of mRNA m6A sites in Arabidopsis thaliana. , 2016, Molecular bioSystems.

[49]  Bifeng Yuan,et al.  Determination of DNA adenine methylation in genomes of mammals and plants by liquid chromatography/mass spectrometry , 2015 .

[50]  Q. Cui,et al.  SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features , 2016, Nucleic acids research.

[51]  Wei Chen,et al.  Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines , 2017, Scientific Reports.

[52]  Jun Hu,et al.  TargetATPsite: A template‐free method for ATP‐binding sites prediction with residue evolution image sparse representation and classifier ensemble , 2013, J. Comput. Chem..

[53]  Chen Lin,et al.  LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy , 2014, Neurocomputing.

[54]  J. Bujnicki,et al.  MODOMICS: a database of RNA modification pathways—2013 update , 2012, Nucleic Acids Res..

[55]  Wei Huang,et al.  Increased N6-methyladenosine in Human Sperm RNA as a Risk Factor for Asthenozoospermia , 2016, Scientific Reports.

[56]  Hong-Bin Shen,et al.  TargetM6A: Identifying N6-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine , 2016, IEEE Transactions on NanoBioscience.

[57]  Arne Klungland,et al.  ALKBH5 is a mammalian RNA demethylase that impacts RNA metabolism and mouse fertility. , 2013, Molecular cell.

[58]  Miao Yu,et al.  A METTL3-METTL14 complex mediates mammalian nuclear RNA N6-adenosine methylation , 2013, Nature chemical biology.

[59]  Wei Chen,et al.  Identifying N6-methyladenosine sites in the Arabidopsis thaliana transcriptome , 2016, Molecular Genetics and Genomics.