DeepRMethylSite: a deep learning based approach for prediction of arginine methylation sites in proteins.

Methylation, which is one of the most prominent post-translational modifications on proteins, regulates many important cellular functions. Though several model-based methylation site predictors have been reported, all existing methods employ machine learning strategies, such as support vector machines and random forest, to predict sites of methylation based on a set of "hand-selected" features. As a consequence, the subsequent models may be biased toward one set of features. Moreover, due to the large number of features, model development can often be computationally expensive. In this paper, we propose an alternative approach based on deep learning to predict arginine methylation sites. Our model, which we termed DeepRMethylSite, is computationally less expensive than traditional feature-based methods while eliminating potential biases that can arise through features selection. Based on independent testing on our dataset, DeepRMethylSite achieved efficiency scores of 68%, 82% and 0.51 with respect to sensitivity (SN), specificity (SP) and Matthew's correlation coefficient (MCC), respectively. Importantly, in side-by-side comparisons with other state-of-the-art methylation site predictors, our method performs on par or better in all scoring metrics tested.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  K. Chou,et al.  iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach , 2014, BioMed research international.

[3]  S. Pullamsetti,et al.  Assaying epigenome functions of PRMTs and their substrates. , 2020, Methods.

[4]  Andrew J. Bannister,et al.  Regulation of chromatin by histone modifications , 2011, Cell Research.

[5]  M. Bedford,et al.  Arginine methylation at a glance , 2007, Journal of Cell Science.

[6]  Jorng-Tzong Horng,et al.  Incorporating structural characteristics for identification of protein methylation sites , 2009, J. Comput. Chem..

[7]  Gaotao Shi,et al.  Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Valeria Vitelli,et al.  Probabilistic preference learning with the Mallows rank model , 2014, J. Mach. Learn. Res..

[9]  M. Mann,et al.  Identifying and quantifying in vivo methylation sites by heavy methyl SILAC , 2004, Nature Methods.

[10]  Mark T Bedford,et al.  Arginine methylation an emerging regulator of protein function. , 2005, Molecular cell.

[11]  Predrag Radivojac,et al.  The structural and functional signatures of proteins that undergo multiple events of post‐translational modification , 2014, Protein science : a publication of the Protein Society.

[12]  Li-na Wang,et al.  Accurate in silico prediction of species-specific methylation sites based on information gain feature optimization , 2016, Bioinform..

[13]  Shu-Yun Huang,et al.  PMeS: Prediction of Methylation Sites Based on Enhanced Feature Encoding Scheme , 2012, PloS one.

[14]  Taghi M. Khoshgoftaar,et al.  Survey on deep learning with class imbalance , 2019, J. Big Data.

[15]  Hsien-Da Huang,et al.  dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins , 2015, Nucleic Acids Res..

[16]  Maxat Kulmanov,et al.  DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier , 2017, Bioinform..

[17]  J. Yates,et al.  A method for the comprehensive proteomic analysis of membrane proteins , 2003, Nature Biotechnology.

[18]  D. Gupta,et al.  PRmePRed: A protein arginine methylation prediction tool , 2017, PloS one.

[19]  Yu Xue,et al.  MeMo: a web tool for prediction of protein methylation modifications , 2006, Nucleic Acids Res..

[20]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[21]  Jianxiao Zou,et al.  MRCNN: a deep learning model for regression of genome-wide DNA methylation , 2019, BMC Genomics.

[22]  Yanchun Liang,et al.  MusiteDeep: a deep‐learning framework for general and kinase‐specific phosphorylation site prediction , 2017, Bioinform..

[23]  Joshua J. Levy,et al.  MethylNet: an automated and modular deep learning approach for DNA methylation analysis , 2020, BMC Bioinformatics.

[24]  DeepSuccinylSite: a deep learning based approach for protein succinylation site prediction , 2020, BMC Bioinformatics.

[25]  M. Rees,et al.  ɛ-N-Methyl-lysine in Bacterial Flagellar Protein , 1959, Nature.

[26]  Dong Xu,et al.  Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction , 2009, PloS one.

[27]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[28]  Bin Zhang,et al.  PhosphoSitePlus, 2014: mutations, PTMs and recalibrations , 2014, Nucleic Acids Res..

[29]  Cyrus Martin,et al.  The diverse functions of histone lysine methylation , 2005, Nature Reviews Molecular Cell Biology.

[30]  Ying Zhang,et al.  Computational prediction of methylation types of covalently modified lysine and arginine residues in proteins , 2016, Briefings Bioinform..