Prediction of protein cellular attributes using pseudo‐amino acid composition

The cellular attributes of a protein, such as which compartment of a cell it belongs to and how it is associated with the lipid bilayer of an organelle, are closely correlated with its biological functions. The success of human genome project and the rapid increase in the number of protein sequences entering into data bank have stimulated a challenging frontier: How to develop a fast and accurate method to predict the cellular attributes of a protein based on its amino acid sequence? The existing algorithms for predicting these attributes were all based on the amino acid composition in which no sequence order effect was taken into account. To improve the prediction quality, it is necessary to incorporate such an effect. However, the number of possible patterns for protein sequences is extremely large, which has posed a formidable difficulty for realizing this goal. To deal with such a difficulty, the pseudo‐amino acid composition is introduced. It is a combination of a set of discrete sequence correlation factors and the 20 components of the conventional amino acid composition. A remarkable improvement in prediction quality has been observed by using the pseudo‐amino acid composition. The success rates of prediction thus obtained are so far the highest for the same classification schemes and same data sets. It has not escaped from our notice that the concept of pseudo‐amino acid composition as well as its mathematical framework and biochemical implication may also have a notable impact on improving the prediction quality of other protein features. Proteins 2001;43:246–255. © 2001 Wiley‐Liss, Inc.

[1]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[2]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[3]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[4]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[6]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[7]  H. Lodish Molecular Cell Biology , 1986 .

[8]  G. Fasman Prediction of Protein Structure and the Principles of Protein Conformation , 2012, Springer US.

[9]  P. Y. Chou,et al.  Prediction of Protein Structural Classes from Amino Acid Compositions , 1989 .

[10]  M. Sternberg Prediction of protein structure and the principles of protein conformation , 1990 .

[11]  M. Resh,et al.  Myristylation and palmitylation of Src family members: The fats of the matter , 1994, Cell.

[12]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[13]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[14]  J. Trempe Molecular biology of the cell, 3rd edition Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts and James D. Watson, Garland Publishing, 1994, 559.95 (xiii + 1294 pages), ISBN 0-815-31619-4 , 1995, Trends in Endocrinology & Metabolism.

[15]  P. Casey,et al.  Protein lipidation in cell signaling. , 1995, Science.

[16]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[17]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[18]  B. Rost,et al.  Transmembrane helices predicted at 95% accuracy , 1995, Protein science : a publication of the Protein Society.

[19]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[20]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[21]  K. Chou,et al.  Prediction and classification of domain structural classes , 1998, Proteins.

[22]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[23]  K. Chou,et al.  Using discriminant function for prediction of subcellular location of prokaryotic proteins. , 1998, Biochemical and biophysical research communications.

[24]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[25]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[26]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[27]  K. Chou,et al.  Prediction of protein secondary structure content. , 1999, Protein engineering.

[28]  K. Chou,et al.  Using Pair-Coupled Amino Acid Composition to Predict Protein Secondary Structure Content , 1999, Journal of protein chemistry.

[29]  K. Nakai Protein sorting signals and prediction of subcellular localization. , 2000, Advances in protein chemistry.

[30]  K C Chou,et al.  Prediction of protein structural classes and subcellular locations. , 2000, Current protein & peptide science.