An overview on predicting the subcellular location of a protein

The present paper overviews the issue on predicting the subcellular location of a protein. Five measures of extracting information from the global sequence based on the Bayes discriminant algorithm are reviewed. 1) The auto-correlation functions of amino acid indices along the sequence; 2) The quasi-sequence-order approach; 3) the pseudo-amino acid composition; 4) the unified attribute vector in Hilbert space, 5) Zp parameters extracted from the Zp curve. The actual performance of the predictive accuracy is closely related to the degree of similarity between the training and testing sets or to the average degree of pairwise similarity in dataset in a cross-validated study. Many scholars considered that the current higher predictive accuracy still cannot ensure that some available algorithms are effective in practice prediction for the higher pairwise sequence identity of the datasets, but some of them declared that construction of the dataset used for developing software should base on the reality determined by the Mother Nature that some subcellular locations really contain only a minor number of proteins of which some even have a high percentage of sequence similarity. Owing to the complexity of the problem itself, some very sophisticated and special programs are needed for both constructing dataset and improving the prediction. Anyhow finding the target information in mature protein sequence and properly cooperating it with sorting signals in prediction may further improve the overall predictive accuracy and make the prediction into practice.

[1]  K. Chou,et al.  Using neural networks for prediction of subcellular location of prokaryotic and eukaryotic proteins. , 2000, Molecular cell biology research communications : MCBRC.

[2]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[3]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[4]  Chun-Ting Zhang,et al.  A graphic representation of protein sequence and predicting the subcellular locations of prokaryotic proteins. , 2002, The international journal of biochemistry & cell biology.

[5]  István Csabai,et al.  Improving signal peptide prediction accuracy by simulated neural network , 1991, Comput. Appl. Biosci..

[6]  M. Kanehisa,et al.  A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.

[7]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[8]  Yu-Dong Cai,et al.  Is it a paradox or misinterpretation? , 2001, Proteins.

[9]  Zheng Yuan Prediction of protein subcellular locations using Markov chain models , 1999, FEBS letters.

[10]  K. Chou,et al.  Support vector machines for prediction of protein subcellular location. , 2000, Molecular cell biology research communications : MCBRC.

[11]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[12]  D. McGeoch,et al.  On the predictive recognition of signal peptide sequences. , 1985, Virus research.

[13]  Ming Yan,et al.  Prediction of the helix/strand content of globular proteins based on their primary sequences. , 1998, Protein engineering.

[14]  J. Gordon,et al.  Computer-assisted predictions of signal peptidase processing sites. , 1987, Biochemical and biophysical research communications.

[15]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[16]  R. Jernigan,et al.  Understanding the recognition of protein structural classes by amino acid composition , 1997, Proteins.

[17]  K. Chou,et al.  Prediction of Protein Structural Classes by Modified Mahalanobis Discriminant Algorithm , 1998, Journal of protein chemistry.

[18]  B. Rost,et al.  Adaptation of protein surfaces to subcellular location. , 1998, Journal of molecular biology.

[19]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[20]  P. Ponnuswamy,et al.  Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. , 1980, Biochimica et biophysica acta.

[21]  K. Chou,et al.  Using discriminant function for prediction of subcellular location of prokaryotic proteins. , 1998, Biochemical and biophysical research communications.

[22]  C DeLisi,et al.  The detection and classification of membrane-spanning proteins. , 1985, Biochimica et biophysica acta.

[23]  G. Böhm,et al.  Structural relationships of homologous proteins as a fundamental principle in homology modeling , 1993, Proteins.

[24]  G Schneider,et al.  The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site. , 1994, Biophysical journal.

[25]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[26]  L. Gierasch Signal sequences. , 1989, Biochemistry.

[27]  C. Zhang,et al.  Prediction of Membrane Protein Types Based on the Hydrophobic Index of Amino Acids , 2000, Journal of protein chemistry.

[28]  K. Chou,et al.  Prediction of protein signal sequences and their cleavage sites , 2001, Proteins.

[29]  O. Lund,et al.  Protein distance constraints predicted by neural networks and probability density functions. , 1997, Protein engineering.

[30]  M. Inouye,et al.  A single amino acid determinant of the membrane localization of lipoproteins in E. coli , 1988, Cell.

[31]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[32]  P. Klein,et al.  Distinctive properties of signal sequences from bacterial lipoproteins. , 1988, Protein engineering.

[33]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[34]  K. Chou,et al.  Prediction and classification of domain structural classes , 1998, Proteins.

[35]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[36]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[37]  G. Heijne,et al.  ChloroP, a neural network‐based method for predicting chloroplast transit peptides and their cleavage sites , 1999, Protein science : a publication of the Protein Society.

[38]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[39]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[41]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[42]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[43]  C. Zhang,et al.  Prediction of protein (domain) structural classes based on amino-acid index. , 1999, European journal of biochemistry.

[44]  C. Zhang,et al.  Prediction of the subcellular location of prokaryotic proteins based on the hydrophobicity index of amino acids. , 2001, International journal of biological macromolecules.

[45]  K. Chou,et al.  A key driving force in determination of protein structural classes. , 1999, Biochemical and biophysical research communications.

[46]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[47]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[48]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[49]  G M Maggiora,et al.  Domain structural class prediction. , 1998, Protein engineering.

[50]  S. Fields,et al.  Proteomics. Proteomics in genomeland. , 2001, Science.

[51]  K. Chou Using subsite coupling to predict signal peptides. , 2001, Protein engineering.

[52]  Robert F. Murphy,et al.  Towards a Systematics for Protein Subcellular Location: Quantitative Description of Protein Localization Patterns and Automated Analysis of Fluorescence Microscope Images , 2000, ISMB.

[53]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[54]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[55]  T. Blundell,et al.  Catching a common fold , 1993, Protein science : a publication of the Protein Society.

[56]  P Bork,et al.  The immunoglobulin fold. Structural classification, sequence patterns and common core. , 1994, Journal of molecular biology.

[57]  S. Brunak,et al.  Prediction of N-terminal protein sorting signals. , 1997, Current opinion in structural biology.

[58]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[59]  T. P. Flores,et al.  Comparison of conformational characteristics in structurally similar protein pairs , 1993, Protein science : a publication of the Protein Society.

[60]  S. Fields Proteomics in Genomeland , 2001, Science.

[61]  K. Nakai Protein sorting signals and prediction of subcellular localization. , 2000, Advances in protein chemistry.

[62]  H Nielsen,et al.  Machine learning approaches for the prediction of signal peptides and other protein sorting signals. , 1999, Protein engineering.

[63]  C. DeLisi,et al.  Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. , 1987, Journal of molecular biology.

[64]  Z. Feng,et al.  Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. , 2001, Biopolymers.

[65]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[66]  B Rost,et al.  Bridging the protein sequence-structure gap by structure predictions. , 1996, Annual review of biophysics and biomolecular structure.

[67]  Fujiwara,et al.  Prediction of Mitochondrial Targeting Signals Using Hidden Markov Model. , 1997, Genome informatics. Workshop on Genome Informatics.

[68]  G. Heijne A new method for predicting signal sequence cleavage sites. , 1986 .

[69]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[70]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[71]  R Zhang,et al.  Analysis of distribution of bases in the coding sequences by a diagrammatic technique. , 1991, Nucleic acids research.

[72]  K. Chou,et al.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[73]  M. Gonzalo Claros,et al.  MitoProt, a Macintosh application for studying mitochondrial proteins , 1995, Comput. Appl. Biosci..

[74]  Paul Horton,et al.  Better Prediction of Protein Cellular Localization Sites with the it k Nearest Neighbors Classifier , 1997, ISMB.

[75]  P Vincens,et al.  Computational method to predict mitochondrially imported proteins and their targeting sequences. , 1996, European journal of biochemistry.