Bioinformatics analysis and collection of protein post-translational modification sites in human viruses

In viruses, post-translational modifications (PTMs) are essential for their life cycle. Recognizing viral PTMs is very important for better understanding the mechanism of viral infections and finding potential drug targets. However, few studies have investigated the roles of viral PTMs in virus-human interactions using comprehensive viral PTM datasets. To fill this gap, firstly, we developed a viral post-translational modification database (VPTMdb) for collecting systematic information of viral PTM data. The VPTMdb contains 912 PTM sites that integrate 414 experimental-confirmed PTM sites with 98 proteins in 45 human viruses manually extracted from 162 publications and 498 PTMs extracted from UniProtKB/Swiss-Prot. Secondly, we investigated the viral PTM sequence motifs, the function of target human proteins, and characteristics of PTM protein domains. The results showed that (i) viral PTMs have the consensus motifs with human proteins in phosphorylation, SUMOylation and N-glycosylation. (ii) The function of human proteins that targeted by viral PTM proteins are related to protein targeting, translation, and localization. (iii) Viral PTMs are more likely to be enriched in protein domains. The findings should make an important contribution to the field of virus-human interaction. Moreover, we created a novel sequence-based classifier named VPTMpre to help users predict viral protein phosphorylation sites. Finally, an online web server was implemented for users to download viral protein PTM data and predict phosphorylation sites of interest. Author summary Post-translational modifications (PTMs) plays an important role in the regulation of viral proteins; However, due to the limitation of data sets, there has been no detailed investigation of viral protein PTMs characteristics. In this manuscript, we collected experimentally verified viral protein post-translational modification sites and analysed viral PTMs data from a bioinformatics perspective. Besides, we constructed a novel feature-based machine learning model for predicting phosphorylation site. This is the first study to explore the roles of viral protein modification in virus infection using computational methods. The valuable viral protein PTM data resource will provide new insights into virus-host interaction.

[1]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[2]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[3]  T. Fung,et al.  Identification of N-linked glycosylation sites in the spike protein and their functional impact on the replication and infectivity of coronavirus infectious bronchitis virus in cell culture , 2017, Virology.

[4]  Tatiana Ammosova,et al.  HIV-1 Tat phosphorylation on Ser-16 residue modulates HIV-1 transcription , 2018, Retrovirology.

[5]  W. Lim,et al.  Systematic Functional Prioritization of Protein Posttranslational Modifications , 2012, Cell.

[6]  William Stafford Noble,et al.  MoMo: Discovery of statistically significant post-translational modification motifs , 2018, bioRxiv.

[7]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[8]  A C C Gibbs,et al.  Data Analysis , 2009, Encyclopedia of Database Systems.

[9]  Benjamin Haibe-Kains,et al.  mRMRe: an R package for parallelized mRMR ensemble feature selection , 2013, Bioinform..

[10]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[11]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[12]  P. Beltrão,et al.  Evolution of protein kinase substrate recognition at the active site , 2018, bioRxiv.

[13]  P. Lehner,et al.  Viral avoidance and exploitation of the ubiquitin system , 2009, Nature Cell Biology.

[14]  S. Schreiner,et al.  Viral Mimicry to Usurp Ubiquitin and SUMO Host Pathways , 2015, Viruses.

[15]  A. Stukalov,et al.  An orthogonal proteomic survey uncovers novel Zika virus host factors , 2018, Nature.

[16]  Q. Zou,et al.  Research progress in protein posttranslational modification site prediction. , 2018, Briefings in functional genomics.

[17]  A. Sharrocks,et al.  An extended consensus motif enhances the specificity of substrate modification by SUMO , 2006, The EMBO journal.

[18]  H. Yang,et al.  The repressive activity of hepatitis C virus core protein on the transcription of p21(waf1) is regulated by protein kinase A-mediated phosphorylation. , 2001, Virus research.

[19]  H. Sobhy A Review of Functional Motifs Utilized by Viruses , 2016, Proteomes.

[20]  Tzong-Yi Lee,et al.  ViralPhos: incorporating a recursively statistical method to predict phosphorylation sites on virus proteins , 2013, BMC Bioinformatics.

[21]  Jincheng Li,et al.  Feature Extractions for Computationally Predicting Protein Post- Translational Modifications , 2017, Current Bioinformatics.

[22]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[23]  Bin Liu,et al.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..

[24]  Eui Tae Kim,et al.  Time-resolved Global and Chromatin Proteomics during Herpes Simplex Virus Type 1 (HSV-1) Infection* , 2017, Molecular & Cellular Proteomics.

[25]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[26]  Indranil Banerjee,et al.  Influenza A virus uses the aggresome processing machinery for host cell entry , 2014, Science.

[27]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[28]  George M Church,et al.  Collection and Motif-Based Prediction of Phosphorylation Sites in Human Viruses , 2010, Science Signaling.

[29]  Christopher M Hickey,et al.  Function and regulation of SUMO proteases , 2012, Nature Reviews Molecular Cell Biology.

[30]  Sheng Li,et al.  An optimized algorithm for detecting and annotating regional differential methylation , 2013, BMC Bioinformatics.

[31]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[32]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[33]  Achuthsankar S. Nair,et al.  Composition, Transition and Distribution (CTD) — A dynamic feature for predictions based on hierarchical structure of cellular sorting , 2011, 2011 Annual IEEE India Conference.

[34]  S. Wold,et al.  Peptide quantitative structure-activity relationships, a multivariate approach. , 1987, Journal of medicinal chemistry.