Prediction and classification of protein subcellular location—sequence‐order effect and pseudo amino acid composition

Given a protein sequence, how to identify its subcellular location? With the rapid increase in newly found protein sequences entering into databanks, the problem has become more and more important because the function of a protein is closely correlated with its localization. To practically deal with the challenge, a dataset has been established that allows the identification performed among the following 14 subcellular locations: (1) cell wall, (2) centriole, (3) chloroplast, (4) cytoplasm, (5) cytoskeleton, (6) endoplasmic reticulum, (7) extracellular, (8) Golgi apparatus, (9) lysosome, (10) mitochondria, (11) nucleus, (12) peroxisome, (13) plasma membrane, and (14) vacuole. Compared with the datasets constructed by the previous investigators, the current one represents the largest in the scope of localizations covered, and hence many proteins which were totally out of picture in the previous treatments, can now be investigated. Meanwhile, to enhance the potential and flexibility in taking into account the sequence‐order effect, the series‐mode pseudo‐amino‐acid‐composition has been introduced as a representation for a protein. High success rates are obtained by the re‐substitution test, jackknife test, and independent dataset test, respectively. It is anticipated that the current automated method can be developed to a high throughput tool for practical usage in both basic research and pharmaceutical industry. © 2003 Wiley‐Liss, Inc.

[1]  K. Chou,et al.  Support vector machines for prediction of protein subcellular location by incorporating quasi‐sequence‐order effect , 2002, Journal of cellular biochemistry.

[2]  Sándor Pongor,et al.  The SBASE protein domain library, release 8.0: a collection of annotated protein sequence segments , 2001, Nucleic Acids Res..

[3]  W. R. Krigbaum,et al.  Prediction of the amount of secondary structure in a globular protein from its aminoacid composition. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[4]  M. Kanehisa,et al.  A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.

[5]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[6]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[7]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[8]  G. Fasman Prediction of Protein Structure and the Principles of Protein Conformation , 2012, Springer US.

[9]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[10]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[11]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[12]  K. Chou,et al.  Prediction of protein secondary structure content. , 1999, Protein engineering.

[13]  Z Zhang,et al.  Prediction of the secondary structure content of globular proteins based on structural classes , 1996, Journal of protein chemistry.

[14]  T. Radford Metaphors and dreams , 2003 .

[15]  Sándor Pongor,et al.  The SBASE protein domain library, Release 4.0: a collection of annotated protein sequence segments , 1993, Nucleic Acids Res..

[16]  Z Zhang,et al.  Prediction of the Secondary Structure Contents of Globular Proteins Based on Three Structural Classes , 1998, Journal of protein chemistry.

[17]  R. Jernigan,et al.  Understanding the recognition of protein structural classes by amino acid composition , 1997, Proteins.

[18]  K. Chou,et al.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[19]  Yu-Dong Cai,et al.  Is it a paradox or misinterpretation? , 2001, Proteins.

[20]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[21]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[22]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[23]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[24]  S. Brunak,et al.  Prediction of N-terminal protein sorting signals. , 1997, Current opinion in structural biology.

[25]  K. Chou,et al.  Prediction and classification of domain structural classes , 1998, Proteins.

[26]  Kuo-Chen Chou,et al.  Prediction of enzyme family classes. , 2003, Journal of proteome research.

[27]  K. Chou,et al.  A key driving force in determination of protein structural classes. , 1999, Biochemical and biophysical research communications.

[28]  K. Chou,et al.  Using Pair-Coupled Amino Acid Composition to Predict Protein Secondary Structure Content , 1999, Journal of protein chemistry.

[29]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[30]  P. Y. Chou,et al.  Prediction of Protein Structural Classes from Amino Acid Compositions , 1989 .

[31]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[32]  Steven M. Muskal,et al.  Predicting protein secondary structure content. A tandem neural network approach. , 1992, Journal of molecular biology.

[33]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[34]  K. Chou,et al.  Bioinformatical analysis of G-protein-coupled receptors. , 2002, Journal of proteome research.

[35]  K. Chou,et al.  Prediction of Protein Structural Classes by Modified Mahalanobis Discriminant Algorithm , 1998, Journal of protein chemistry.

[36]  K. Chou,et al.  Predicting protein quaternary structure by pseudo amino acid composition , 2003, Proteins.

[37]  K C Chou,et al.  Prediction of protein structural classes and subcellular locations. , 2000, Current protein & peptide science.

[38]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[39]  K. Chou,et al.  A study on the correlation of G-protein-coupled receptor types with amino acid composition. , 2002, Protein engineering.

[40]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[41]  Kuo-Chen Chou,et al.  Prediction of protein signal sequences. , 2002, Current protein & peptide science.

[42]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[43]  Lin He,et al.  Application of Pseudo Amino Acid Composition for Predicting Protein Subcellular Location: Stochastic Signal Processing Approach , 2003, Journal of protein chemistry.

[44]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[45]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.