Predicting pupylation sites in prokaryotic proteins using semi-supervised self-training support vector machine algorithm.

As one important post-translational modification of prokaryotic proteins, pupylation plays a key role in regulating various biological processes. The accurate identification of pupylation sites is crucial for understanding the underlying mechanisms of pupylation. Although several computational methods have been developed for the identification of pupylation sites, the prediction accuracy of them is still unsatisfactory. Here, a novel bioinformatics tool named IMP-PUP is proposed to improve the prediction of pupylation sites. IMP-PUP is constructed on the composition of k-spaced amino acid pairs and trained with a modified semi-supervised self-training support vector machine (SVM) algorithm. The proposed algorithm iteratively trains a series of support vector machine classifiers on both annotated and non-annotated pupylated proteins. Computational results show that IMP-PUP achieves the area under receiver operating characteristic curves of 0.91, 0.73, and 0.75 on our training set, Tung's testing set, and our testing set, respectively, which are better than those of the different error costs SVM algorithm and the original self-training SVM algorithm. Independent tests also show that IMP-PUP significantly outperforms three other existing pupylation site predictors: GPS-PUP, iPUP, and pbPUP. Therefore, IMP-PUP can be a useful tool for accurate prediction of pupylation sites. A MATLAB software package for IMP-PUP is available at https://juzhe1120.github.io/.

[1]  Mohamed Cheriet,et al.  Help-Training for semi-supervised support vector machines , 2011, Pattern Recognit..

[2]  Ying Liu,et al.  A self-trained semisupervised SVM approach to the remote sensing land cover classification , 2013, Comput. Geosci..

[3]  Xiaowei Zhao,et al.  Prediction of Protein Phosphorylation Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2012, PloS one.

[4]  P. Dorrestein,et al.  Proteasomal Protein Degradation in Mycobacteria Is Dependent upon a Prokaryotic Ubiquitin-like Protein* , 2009, Journal of Biological Chemistry.

[5]  Steven P Gygi,et al.  Reconstitution of the Mycobacterium tuberculosis pupylation pathway in Escherichia coli , 2011, EMBO reports.

[6]  Vineet Bafna,et al.  Expansion of the mycobacterial "PUPylome". , 2010, Molecular bioSystems.

[7]  Jinyan Li,et al.  Computational Identification of Protein Pupylation Sites by Using Profile-Based Composition of k-Spaced Amino Acid Pairs , 2015, PloS one.

[8]  Jianding Qiu,et al.  Systematic Analysis and Prediction of Pupylation Sites in Prokaryotic Proteins , 2013, PloS one.

[9]  C. Tung Prediction of pupylation sites using the composition of k-spaced amino acid pairs. , 2013, Journal of theoretical biology.

[10]  Chun-Wei Tung,et al.  PupDB: a database of pupylated proteins , 2012, BMC Bioinformatics.

[11]  Yu Xue,et al.  GPS-PUP: computational prediction of pupylation sites in prokaryotic proteins. , 2011, Molecular Biosystems.

[12]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[13]  Ling-Yun Wu,et al.  Prediction of palmitoylation sites using the composition of k-spaced amino acid pairs. , 2009, Protein engineering, design & selection : PEDS.

[14]  Hong Gu,et al.  iLM-2L: A two-level predictor for identifying protein lysine methylation sites and their methylation degrees by incorporating K-gap amino acid pairs into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Julian Mintseris,et al.  Prokayrotic Ubiquitin-Like Protein (Pup) Proteome of Mycobacterium tuberculosis , 2010, PloS one.

[17]  Vasile Palade,et al.  Class Imbalance Learning Methods for Support Vector Machines , 2013 .

[18]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[19]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[20]  K. Walters,et al.  Prokaryotic ubiquitin-like protein pup is intrinsically disordered. , 2009, Journal of molecular biology.

[21]  M. Sutter,et al.  Bacterial ubiquitin-like modifier Pup is deamidated and conjugated to substrates by distinct but homologous enzymes , 2009, Nature Structural &Molecular Biology.

[22]  Kai Stühler,et al.  Proteome-wide identification of mycobacterial pupylation targets , 2010, Molecular systems biology.

[23]  S. Gygi,et al.  Ubiquitin-Like Protein Involved in the Proteasome Pathway of Mycobacterium tuberculosis , 2008, Science.

[24]  Ziding Zhang,et al.  Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs , 2008, BMC Bioinformatics.

[25]  Xiaoming Tu,et al.  Pup, a prokaryotic ubiquitin-like protein, is an intrinsically disordered protein. , 2009, The Biochemical journal.

[26]  L. Lerman,et al.  Ubiquitin and ubiquitin-like proteins in protein regulation. , 2007, Circulation research.

[27]  Michael Thommen,et al.  Mycobacterial Ubiquitin-like Protein Ligase PafA Follows a Two-step Reaction Pathway with a Phosphorylated Pup Intermediate* , 2010, The Journal of Biological Chemistry.