Using multi-label algorithm to predict the post-translation modification types of proteins

Post-translational modifications (PTMs) play vital roles in most of the protein maturation, structural stabilization and function. How to predict protein' PTMs types is an important and challenging problem. Most of the existing approaches can only be used to recognize single-label PTMs type. By introducing the multi-labeled K-Nearest-Neighbor algorithm, a new predictor has been proposed which can be used to dispose of the proteins containing both single and multi-label PTMs type. As a result that the 10-fold crosses validation was implemented on a benchmark data set of proteins which were divided into the following 4 types: (1) methylation, (2) nitrosylation, (3) acetylation, (4) phosphorylation, where many proteins belong to two or more types. For such a complex system, the outcomes achieved by our predictor for the six indices were quite promising, anticipated the predictor may become a complementary tool in this area.

[1]  G. Fasman Prediction of Protein Structure and the Principles of Protein Conformation , 2012, Springer US.

[2]  Predrag Radivojac,et al.  Improving phosphopeptide identification in shotgun proteomics by supervised filtering of peptide-spectrum matches , 2013, BCB.

[3]  T. Creighton,et al.  Prediction of Protein Structure and the Principles of Protein Conformation. Gerald D. Fasman, Ed. Plenum, New York, 1989. xiv, 798 pp., illus. $95. , 1990, Science.

[4]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[5]  Shu-Yun Huang,et al.  PMeS: Prediction of Methylation Sites Based on Enhanced Feature Encoding Scheme , 2012, PloS one.

[6]  K. Chou,et al.  iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. , 2013, Molecular bioSystems.

[7]  Bermseok Oh,et al.  Prediction of phosphorylation sites using SVMs , 2004, Bioinform..

[8]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[9]  Richard Wolfenden,et al.  Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution , 1988 .

[10]  H. A. Sober,et al.  Handbook of Biochemistry: Selected Data for Molecular Biology , 1971 .

[11]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[12]  K. Chou,et al.  Prediction of protein secondary structure content. , 1999, Protein engineering.

[13]  Kuo-Chen Chou,et al.  iNR-PhysChem: A Sequence-Based Predictor for Identifying Nuclear Receptors and Their Subfamilies via Physical-Chemical Property Matrix , 2012, PloS one.

[14]  M. Hochstrasser,et al.  N-terminal acetylation of the yeast Derlin Der1 is essential for Hrd1 ubiquitin-ligase activity toward luminal ER substrates , 2013, Molecular biology of the cell.

[15]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.