Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition.

A new representation of protein sequence is devoted in this paper, in which each protein can be represented by a 20-dimensional (20D) vector of unit length. Inspired by the principle of superposition of state in quantum mechanics, the squares of the 20 components of the vector correspond to the amino acid composition. Using the new representation of the primary sequence and Bayes Discriminant Algorithm, the subcellular location of prokaryotic proteins was predicted. The overall predictive accuracy in the jackknife test can be 3% higher than the result of using amino acid composition directly for the database of sequence identity is less than 90%, but 5% higher when sequence identity is less than 80%. The higher predictive accuracy indicates that the current measure of extracting the information from the primary sequence is efficient. Since the subcellular location restricting a protein's possible function, the present method should also be a useful measure for the systematic analysis of genome data. The program used in this paper is available on request.

[1]  M. Kanehisa,et al.  A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.

[2]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[3]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[4]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[5]  H. Hilbert,et al.  Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. , 1996, Nucleic acids research.

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  K. Chou,et al.  Prediction of Protein Structural Classes by Modified Mahalanobis Discriminant Algorithm , 1998, Journal of protein chemistry.

[8]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[9]  C. Zhang,et al.  Prediction of Membrane Protein Types Based on the Hydrophobic Index of Amino Acids , 2000, Journal of protein chemistry.

[10]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[11]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[12]  O. Lund,et al.  Protein distance constraints predicted by neural networks and probability density functions. , 1997, Protein engineering.

[13]  K. Chou,et al.  Using discriminant function for prediction of subcellular location of prokaryotic proteins. , 1998, Biochemical and biophysical research communications.

[14]  B. Rost,et al.  Adaptation of protein surfaces to subcellular location. , 1998, Journal of molecular biology.

[15]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[16]  G M Maggiora,et al.  Domain structural class prediction. , 1998, Protein engineering.

[17]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[18]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.