A glimpse at the organization of the protein universe.

The amino acid sequences of most natural proteins result in an ability to fold to specific structures that generate biological activity, and simultaneously to avoid misfolding and aggregation (1). It appears from the data available to us at present that the overall architecture (the “fold”) of these structures is much more highly conserved during evolution than the sequences that encode them. These folds have therefore emerged as ideal candidates for classifying proteins (Fig. 1) and hence to begin to make order of the protein universe (2). The continuing advances in structural biology, and particularly the recent emergence of structural genomics initiatives in which particular emphasis is placed on the discovery of new folds (3), are providing an opportunity to build up a comprehensive map of the protein universe. Of particular significance is the fact that the number of distinct structural archetypes, or folds, is thought to be relatively small, less than ≈10,000 by most estimates, with many different sequences able to encode the same basic fold of the polypeptide chain (4). A key question in the analysis of protein sequences and structures is the way in which they relate to their functions. Clues as to the answer will not only begin to enlighten us as to the fundamental organization of the protein universe, and the location within it of natural proteins, but will also provide a means of predicting the functions of those proteins for which this information is not yet defined by experiment. The ability to predict function will be of tremendous value, for example, in interpreting the output of genome sequencing programs, or in the design of new proteins with specific functional characteristics. In a recent issue of PNAS, Kim and colleagues (5) take a significant step toward this objective by extending their earlier study (6) to show that proteins with similar functions can be found close together in the protein universe—provided that the latter is organized through structural considerations in a suitable way, termed the structure space map (SSM). Fig. 1. The totality of all possible proteins can be looked at in three different ways: (i) the protein universe, which is formed by all possible amino acid sequences; (ii) the protein fold universe, which contains all possible folds associated with these sequences; ...

[1]  S. Copley Enzymes with extra talents: moonlighting functions and catalytic promiscuity. , 2003, Current opinion in chemical biology.

[2]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[3]  Allen P. Minton,et al.  Cell biology: Join the crowd , 2003, Nature.

[4]  J. Whisstock,et al.  Protein structural alignments and functional genomics , 2001, Proteins.

[5]  Sung-Hou Kim,et al.  A global representation of the protein fold space , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  C. Dobson Chemical space and biology , 2004, Nature.

[7]  C. Dobson Protein folding and misfolding , 2003, Nature.

[8]  Kresten Lindorff-Larsen,et al.  Protein folding and the organization of the protein topology universe. , 2005, Trends in biochemical sciences.

[9]  M. Karplus,et al.  Three key residues form a critical contact network in a protein folding transition state , 2001, Nature.

[10]  Michele Vendruscolo,et al.  Towards complete descriptions of the free–energy landscapes of proteins , 2005, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[11]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[12]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[13]  John Moult,et al.  A unifold, mesofold, and superfold model of protein fold use , 2002, Proteins.

[14]  Sung-Hou Kim,et al.  Global mapping of the protein structure space and application in structure-based inference of protein function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  S. Buchanan Structural genomics: bridging functional genomics and structure-based drug design. , 2002, Current opinion in drug discovery & development.

[16]  Charles DeLisi,et al.  Functional fingerprints of folds: evidence for correlated structure-function evolution. , 2003, Journal of molecular biology.

[17]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[18]  Anton J. Enright,et al.  Classification schemes for protein structure and function , 2003, Nature Reviews Genetics.

[19]  A. Lesk,et al.  Determinants of a protein fold. Unique features of the globin amino acid sequences. , 1987, Journal of molecular biology.

[20]  M. Sternberg,et al.  Automated prediction of protein function and detection of functional sites from structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Flavio Seno,et al.  Geometry and symmetry presculpt the free-energy landscape of proteins. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[22]  M. Levitt,et al.  A structural census of the current population of protein sequences. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[23]  O. Ptitsyn,et al.  Protein folding and protein evolution: common folding nucleus in different subfamilies of c-type cytochromes? , 1998, Journal of molecular biology.

[24]  J M Thornton,et al.  Three-dimensional structure analysis of PROSITE patterns. , 1999, Journal of molecular biology.