Prediction of subcellular protein localization based on functional domain composition.

Assigning subcellular localization (SL) to proteins is one of the major tasks of functional proteomics. Despite the impressive technical advances of the past decades, it is still time-consuming and laborious to experimentally determine SL on a high throughput scale. Thus, computational predictions are the preferred method for large-scale assignment of protein SL, and if appropriate, followed up by experimental studies. In this report, using a machine learning approach, the Nearest Neighbor Algorithm (NNA), we developed a prediction system for protein SL in which we incorporated a protein functional domain profile. The overall accuracy achieved by this system is 93.96%. Furthermore, comparisons with other methods have been conducted to demonstrate the validity and efficiency of our prediction system. We also provide an implementation of our Subcellular Location Prediction System (SLPS), which is available at http://pcal.biosino.org.

[1]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[2]  Kuo-Chen Chou,et al.  Predicting protein localization in budding Yeast , 2005, Bioinform..

[3]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[4]  Kuo-Chen Chou,et al.  Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid composition. , 2005, Journal of theoretical biology.

[5]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[6]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[7]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[8]  Y. Z. Chen,et al.  Protein function classification via support vector machine approach. , 2003, Mathematical biosciences.

[9]  B. Rost,et al.  Adaptation of protein surfaces to subcellular location. , 1998, Journal of molecular biology.

[10]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[11]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[12]  Zheng Yuan Prediction of protein subcellular locations using Markov chain models , 1999, FEBS letters.

[13]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[14]  Kuo-Chen Chou,et al.  Predicting 22 protein localizations in budding yeast. , 2004, Biochemical and biophysical research communications.

[15]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[16]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[17]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[18]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[19]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..