Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods

RNA N6-methyladenosine (m6A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m6A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To this day, experimental detection and quantification of RNA m6A modification are still time-consuming and labor-intensive. There is only a limited number of epitranscriptome samples in existing databases, and a matched RNA methylation profile is not often available for a biological problem of interests. As gene expression data are usually readily available for most biological problems, it could be appealing if we can estimate the RNA methylation status from gene expression data using in silico methods. In this study, we explored the possibility of computational prediction of RNA methylation status from gene expression data using classification and regression methods based on mouse RNA methylation data collected from 73 experimental conditions. Elastic Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and Random Forests (RF) were constructed for classification. Both SVM and RF achieved the best performance with the mean area under the curve (AUC) = 0.84 across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was conducted on those sites selected by ENLR as predictors to access the biological significance of the model. Three functional annotation terms were found statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and endoplasmic reticulum. All 3 terms were found to be closely related to m6A pathway. For regression analysis, Elastic Net was implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a mean Spearman correlation coefficient = 0.64. Our exploratory study suggested that gene expression data could be used to construct predictors for m6A methylation status with adequate accuracy. Our work showed for the first time that RNA methylation status may be predicted from the matched gene expression data. This finding may facilitate RNA modification research in various biological contexts when a matched RNA methylation profile is not available, especially in the very early stage of the study.

[1]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[2]  Ya-Zhou Sun,et al.  RNA methylation and diseases: experimental results, databases, Web servers and computational models , 2019, Briefings Bioinform..

[3]  Chengqi Yi,et al.  N6-Methyladenosine in Nuclear RNA is a Major Substrate of the Obesity-Associated FTO , 2011, Nature chemical biology.

[4]  Manoj Bhasin,et al.  Prediction of methylated CpGs in DNA sequences using a support vector machine , 2005, FEBS letters.

[5]  Xiujuan Lei,et al.  WITMSG: Large-scale Prediction of Human Intronic m6A RNA Methylation Sites from Sequence and Genomic Features , 2020, Current genomics.

[6]  Chunjiang He,et al.  The RNA N6-methyladenosine modification landscape of human fetal tissues , 2019, Nature Cell Biology.

[7]  Jian Ren,et al.  m6ASNP: a tool for annotating genetic variants by m6A function , 2018, GigaScience.

[8]  Xiu-juan Lei,et al.  LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor , 2020, Frontiers in Genetics.

[9]  Q. Cui,et al.  SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features , 2016, Nucleic acids research.

[10]  Wei Chen,et al.  Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines , 2017, Scientific Reports.

[11]  Yufei Huang,et al.  Global analysis of N6-methyladenosine functions and its disease association using deep learning and network-based methods , 2018, bioRxiv.

[12]  Gideon Rechavi,et al.  Transcriptome-wide mapping of N6-methyladenosine by m6A-seq based on immunocapturing and massively parallel sequencing , 2013, Nature Protocols.

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  T. Spector,et al.  Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements , 2013, Genome Biology.

[15]  Michael B. Yaffe,et al.  Arginine Methylation Inhibits the Binding of Proline-rich Ligands to Src Homology 3, but Not WW, Domains* , 2000, The Journal of Biological Chemistry.

[16]  R Stephanie Huang,et al.  Long non-coding RNA transcriptome of uncharacterized samples can be accurately imputed using protein-coding genes , 2019, Briefings Bioinform..

[17]  Teng Zhang,et al.  trumpet: transcriptome-guided quality assessment of m6A-seq data , 2018, BMC Bioinformatics.

[18]  Qiang Wang,et al.  Structural basis of N6-adenosine methylation by the METTL3–METTL14 complex , 2016, Nature.

[19]  Alexa B. R. McIntyre,et al.  Altered m6A Modification of Specific Cellular Transcripts Affects Flaviviridae Infection. , 2019, Molecular cell.

[20]  Jia Meng,et al.  m7GHub: deciphering the location, regulation and pathogenesis of internal mRNA N7-methylguanosine (m7G) sites in human , 2020, Bioinform..

[21]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[22]  Jionglong Su,et al.  WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach , 2019, Nucleic acids research.

[23]  Jionglong Su,et al.  PIANO: A Web Server for Pseudouridine-Site (Ψ) Identification and Functional Annotation , 2020, Frontiers in Genetics.

[24]  Erez Y. Levanon,et al.  m6A mRNA methylation facilitates resolution of naïve pluripotency toward differentiation , 2015, Science.

[25]  Shun Liu,et al.  RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data , 2017, Nucleic Acids Res..

[26]  D. Gifford,et al.  Predicting the impact of non-coding variants on DNA methylation , 2016, bioRxiv.

[27]  Dong Xu,et al.  Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks , 2016, Scientific Reports.

[28]  Wei Wang,et al.  Predicting CpG methylation levels by integrating Infinium HumanMethylation450 BeadChip array data. , 2016, Genomics.

[29]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[30]  Guozheng Qin,et al.  RNA methylomes reveal the m6A-mediated regulation of DNA demethylase gene SlDML2 in tomato fruit ripening , 2019, Genome Biology.

[31]  Pak Ching Li,et al.  A comprehensive review of computational prediction of genome-wide features. , 2018, Briefings in bioinformatics.

[32]  O. Elemento,et al.  Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 3′ UTRs and near Stop Codons , 2012, Cell.

[33]  Wei Wang,et al.  Predicting the Human Epigenome from DNA Motifs , 2014, Nature Methods.

[34]  Marcin Feder,et al.  MODOMICS: a database of RNA modification pathways , 2005, Nucleic Acids Res..

[35]  M. Kupiec,et al.  Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq , 2012, Nature.

[36]  Yufei Huang,et al.  DRUM: Inference of Disease-Associated m6A RNA Methylation Sites From a Multi-Layer Heterogeneous Network , 2019, Front. Genet..

[37]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[38]  Gideon Rechavi,et al.  Gene expression regulation mediated through reversible m6A RNA methylation , 2014, Nature Reviews Genetics.

[39]  Xing Chen,et al.  MeT-DB V2.0: elucidating context-specific functions of N6-methyl-adenosine methyltranscriptome , 2017, Nucleic Acids Res..

[40]  Michael Q. Zhang,et al.  Bioinformatics Original Paper Predicting Methylation Status of Cpg Islands in the Human Brain , 2022 .

[41]  Minoru Yoshida,et al.  RNA-Methylation-Dependent RNA Processing Controls the Speed of the Circadian Clock , 2013, Cell.

[42]  Michael Q. Zhang,et al.  Computational prediction of methylation status in human genomic sequences. , 2006, Proceedings of the National Academy of Sciences of the United States of America.