Adaptation of protein surfaces to subcellular location.

In vivo, proteins occur in widely different physio-chemical environments, and, from in vitro studies, we know that protein structure can be very sensitive to environment. However, theoretical studies of protein structure have tended to ignore this complexity. In this paper, we have approached this problem by grouping proteins by their subcellular location and looking at structural properties that are characteristic to each location. We hypothesize that, throughout evolution, each subcellular location has maintained a characteristic physio-chemical environment, and that proteins in each location have adapted to these environments. If so, we would expect that protein structures from different locations will show characteristic differences, particularly at the surface, which is directly exposed to the environment. To test this hypothesis, we have examined all eukaryotic proteins with known three-dimensional structure and for which the subcellular location is known to be either nuclear, cytoplasmic, or extracellular. In agreement with previous studies, we find that the total amino acid composition carries a signal that identifies the subcellular location. This signal was due almost entirely to the surface residues. The surface residue signal was often strong enough to accurately predict subcellular location, given only a knowledge of which residues are at the protein surface. The results suggest how the accuracy of prediction of location from sequence can be improved. We concluded that protein surfaces show adaptation to their subcellular location. The nature of these adaptations suggests several principles that proteins may have used in adapting to particular physio-chemical environments; these principles may be useful for protein design.

[1]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[2]  C. Chothia The nature of the accessible and buried surfaces in proteins. , 1976, Journal of molecular biology.

[3]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[4]  A. Elbein,et al.  Sugar Residues on Protein , 1981 .

[5]  O. P. Bahl,et al.  Sugar residues on proteins. , 1981, CRC critical reviews in biochemistry.

[6]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[7]  K. Nishikawa,et al.  Classification of proteins into groups based on amino acid composition and other characters. I. Angular distribution. , 1983, Journal of biochemistry.

[8]  K. Nishikawa,et al.  Classification of proteins into groups based on amino acid composition and other characters. II. Grouping into four types. , 1983, Journal of biochemistry.

[9]  M. L. Connolly Solvent-accessible surfaces of proteins and nucleic acids. , 1983, Science.

[10]  G von Heijne,et al.  Signal sequences. The limits of variation. , 1985, Journal of molecular biology.

[11]  L. Gierasch,et al.  Molecular mechanisms of protein secretion: the role of the signal sequence. , 1986, Advances in protein chemistry.

[12]  R A Laskey,et al.  Protein import into the cell nucleus. , 1986, Annual review of cell biology.

[13]  T L Blundell,et al.  Comparison of solvent-inaccessible cores of homologous proteins: definitions useful for protein modelling. , 1987, Protein engineering.

[14]  J. Rothman,et al.  Biosynthetic protein transport and sorting by the endoplasmic reticulum and Golgi. , 1987, Annual review of biochemistry.

[15]  S. Wold,et al.  Signal peptide amino acid sequences in Escherichia coli contain information related to final protein localization. A multivariate data analysis. , 1987, The EMBO journal.

[16]  K Verner,et al.  Protein translocation across membranes. , 1988, Science.

[17]  M. Kanehisa,et al.  Cluster analysis of amino acid indices for prediction of protein structure and function. , 1988, Protein engineering.

[18]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[19]  C. Dingwall Transport across the nuclear envelope: Enigmas and explanations , 1991, BioEssays : news and reviews in molecular, cellular and developmental biology.

[20]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[21]  M. Kanehisa,et al.  A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.

[22]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[23]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[24]  M. Nilges,et al.  Structure of the pleckstrin homology domain from beta-spectrin. , 1994, Nature.

[25]  C. Sander,et al.  The HSSP database of protein structure-sequence alignments. , 1994, Nucleic acids research.

[26]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[27]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its new supplement TREMBL , 1996, Nucleic Acids Res..

[28]  B. Rost,et al.  Topology prediction for helical transmembrane proteins at 86% accuracy–Topology prediction at 86% accuracy , 1996, Protein science : a publication of the Protein Society.

[29]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[30]  Chris Sander,et al.  The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..

[31]  B. Rost,et al.  Protein structures sustain evolutionary drift. , 1997, Folding & design.

[32]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[33]  Burkhard Rost,et al.  Sisyphus and prediction of protein structure , 1997, Comput. Appl. Biosci..

[34]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[35]  Søren Brunak,et al.  A Neural Network Method for Identification of Prokaryotic and Eukaryotic Signal Peptides and Prediction of their Cleavage Sites , 1997, Int. J. Neural Syst..

[36]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[37]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..