Position-Specific Analysis and Prediction of Protein Pupylation Sites Based on Multiple Features

Pupylation is one of the most important posttranslational modifications of proteins; accurate identification of pupylation sites will facilitate the understanding of the molecular mechanism of pupylation. Besides the conventional experimental approaches, computational prediction of pupylation sites is much desirable for their convenience and fast speed. In this study, we developed a novel predictor to predict the pupylation sites. First, the maximum relevance minimum redundancy (mRMR) and incremental feature selection methods were made on five kinds of features to select the optimal feature set. Then the prediction model was built based on the optimal feature set with the assistant of the support vector machine algorithm. As a result, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was 0.764, and the Mathews correlation coefficient was 0.522, indicating a good prediction. Feature analysis showed that all features types contributed to the prediction of protein pupylation sites. Further site-specific features analysis revealed that the features of sites surrounding the central lysine contributed more to the determination of pupylation sites than the other sites.

[1]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[2]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[5]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[6]  Julian Mintseris,et al.  Prokayrotic Ubiquitin-Like Protein (Pup) Proteome of Mycobacterium tuberculosis , 2010, PloS one.

[7]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[8]  V. Vacic,et al.  Identification, analysis, and prediction of protein ubiquitination sites , 2010, Proteins.

[9]  Xiang-tao Li,et al.  Prediction of Lysine Ubiquitylation with Ensemble Classifier and Feature Selection , 2011, International journal of molecular sciences.

[10]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[11]  Kuo-Chen Chou,et al.  Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition. , 2005, Biochemical and biophysical research communications.

[12]  Silvio C. E. Tosatto,et al.  REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform , 2009, Bioinform..

[13]  T. Gibson,et al.  A careful disorderliness in the proteome: Sites for interaction and targets for future therapies , 2008, FEBS letters.

[14]  Xiaoqi Zheng,et al.  A complexity-based method for predicting protein subcellular location , 2009, Amino Acids.

[15]  Yu-Dong Cai,et al.  Prediction and analysis of protein methylarginine and methyllysine based on Multisequence features. , 2011, Biopolymers.

[16]  Xing-Ming Zhao,et al.  Prediction of S-Glutathionylation Sites Based on Protein Sequences , 2013, PloS one.

[17]  Yu Xue,et al.  GPS-PUP: computational prediction of pupylation sites in prokaryotic proteins. , 2011, Molecular Biosystems.

[18]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[19]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[20]  A. Ciechanover,et al.  The ubiquitin system. , 1998, Annual review of biochemistry.

[21]  Chun-Wei Tung,et al.  PupDB: a database of pupylated proteins , 2012, BMC Bioinformatics.

[22]  Yu-Dong Cai,et al.  A novel computational method to predict transcription factor DNA binding preference. , 2006, Biochemical and biophysical research communications.

[23]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[24]  Nimrod D. Rubinstein,et al.  A machine-learning approach for predicting B-cell epitopes. , 2009, Molecular immunology.

[25]  K. Darwin,et al.  Prokaryotic ubiquitin-like protein (Pup), proteasomes and pathogenesis , 2009, Nature Reviews Microbiology.

[26]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[27]  R. Mayer,et al.  Ubiquitin and ubiquitin-like proteins as multifunctional signals , 2005, Nature Reviews Molecular Cell Biology.

[28]  M. Wilkins,et al.  Surface accessibility of protein post-translational modifications. , 2007, Journal of proteome research.

[29]  Kai Stühler,et al.  Proteome-wide identification of mycobacterial pupylation targets , 2010, Molecular systems biology.

[30]  Shinn-Ying Ho,et al.  Computational identification of ubiquitylation sites from protein sequences , 2008, BMC Bioinformatics.

[31]  Kuo-Chen Chou,et al.  Prediction of Protein Domain with mRMR Feature Selection and Analysis , 2012, PloS one.

[32]  Kuo-Chen Chou,et al.  Large‐scale plant protein subcellular location prediction , 2007, Journal of cellular biochemistry.

[33]  W. Atchley,et al.  Solving the protein sequence metric problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Nicholas J. Schork,et al.  Accurate prediction of deleterious protein kinase polymorphisms , 2007, Bioinform..

[35]  Vineet Bafna,et al.  Expansion of the mycobacterial "PUPylome". , 2010, Molecular bioSystems.

[36]  M. Sutter,et al.  Bacterial ubiquitin-like modifier Pup is deamidated and conjugated to substrates by distinct but homologous enzymes , 2009, Nature Structural &Molecular Biology.

[37]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[38]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[39]  Yu-Dong Cai,et al.  Prediction of Deleterious Non-Synonymous SNPs Based on Protein Interaction Network and Hybrid Properties , 2010, PloS one.

[40]  K. Chou,et al.  Prediction and analysis of protein palmitoylation sites. , 2011, Biochimie.

[41]  Steven P Gygi,et al.  Reconstitution of the Mycobacterium tuberculosis pupylation pathway in Escherichia coli , 2011, EMBO reports.

[42]  Michael Thommen,et al.  Mycobacterial Ubiquitin-like Protein Ligase PafA Follows a Two-step Reaction Pathway with a Phosphorylated Pup Intermediate* , 2010, The Journal of Biological Chemistry.

[43]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[44]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.