TSSub: eukaryotic protein subcellular localization by extracting features from profiles

UNLABELLED This paper introduces a new subcellular localization system (TSSub) for eukaryotic proteins. This system extracts features from both profiles and amino acid sequences. Four different features are extracted from profiles by four probabilistic neural network (PNN) classifiers, respectively (the amino acid composition from whole profiles; the amino acid composition from the N-terminus of profiles; the dipeptide composition from whole profiles and the amino acid composition from fragments of profiles). In addition, a support vector machine (SVM) classifier is added to implement the residue-couple feature extracted from amino acid sequences. The results from the five classifiers are fused by an additional SVM classifier. The overall accuracies of this TSSub reach 93.0 and 77.4% on Reinhardt and Hubbard's eukaryotic protein dataset and Huang and Li's eukaryotic protein dataset, respectively. The comparison with existing methods results shows TSSub provides better prediction performance than existing methods. AVAILABILITY The web server is available from http://166.111.24.5/webtools/TSSub/index.html.

[1]  K. Nishikawa,et al.  Discrimination of intracellular and extracellular proteins by single residue and residue-pair scores , 1994 .

[2]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[3]  Donald F. Specht,et al.  Probabilistic neural networks and general regression neural networks , 1996 .

[4]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[5]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[6]  Søren Brunak,et al.  A Neural Network Method for Identification of Prokaryotic and Eukaryotic Signal Peptides and Prediction of their Cleavage Sites , 1997, Int. J. Neural Syst..

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[9]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[10]  H Nielsen,et al.  Machine learning approaches for the prediction of signal peptides and other protein sorting signals. , 1999, Protein engineering.

[11]  Zheng Yuan Prediction of protein subcellular locations using Markov chain models , 1999, FEBS letters.

[12]  Y D Cai,et al.  Using neural networks for prediction of domain structural classes. , 2000, Biochimica et biophysica acta.

[13]  K. Nakai Protein sorting signals and prediction of subcellular localization. , 2000, Advances in protein chemistry.

[14]  K. Chou,et al.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[17]  C. Zhang,et al.  Prediction of the subcellular location of prokaryotic proteins based on the hydrophobicity index of amino acids. , 2001, International journal of biological macromolecules.

[18]  Z. Feng,et al.  Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. , 2001, Biopolymers.

[19]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[20]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[21]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[22]  Kuo-Chen Chou,et al.  A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. , 2003, Biochemical and biophysical research communications.

[23]  Kuo-Chen Chou,et al.  Prediction and classification of protein subcellular location—sequence‐order effect and pseudo amino acid composition , 2003, Journal of cellular biochemistry.

[24]  Ke Wang,et al.  PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria , 2003, Nucleic Acids Res..

[25]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[26]  Gajendra P. S. Raghava,et al.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST , 2004, Nucleic Acids Res..

[27]  Jian Guo,et al.  A Novel Method for Protein Subcellular Localization Based on Boosting and Probabilistic Neural Network , 2004, APBC.

[28]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[29]  Jenn-Kang Hwang,et al.  Predicting subcellular localization of proteins for Gram‐negative bacteria by support vector machines based on n‐peptide compositions , 2004, Protein science : a publication of the Protein Society.

[30]  Jian Guo,et al.  A novel method for protein subcellular localization: Combining residue-couple model and SVM , 2005, APBC.

[31]  Gajendra P. S. Raghava,et al.  PSLpred: prediction of subcellular localization of bacterial proteins , 2005, Bioinform..

[32]  Manoj Bhasin,et al.  SVM-based method for subcellular localization of human proteins using amino acid compositions , their order and similarity search , 2005 .

[33]  M. Bhasin,et al.  Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search* , 2005, Journal of Biological Chemistry.