Did evolution leap to create the protein universe?

The genomes of over 60 organisms from all three kingdoms of life are now entirely sequenced. In many respects, the inventory of proteins used in different kingdoms appears surprisingly similar. However, eukaryotes differ from other kingdoms in that they use many long proteins, and have more proteins with coiled-coil helices and with regions abundant in regular secondary structure. Particular structural domains are used in many pathways. Nevertheless, one domain tends to occur only once in one particular pathway. Many proteins do not have close homologues in different species (orphans) and there could even be folds that are specific to one species. This view implies that protein fold space is discrete. An alternative model suggests that structure space is continuous and that modern proteins evolved by aggregating fragments of ancient proteins. Either way, after having harvested proteomes by applying standard tools, the challenge now seems to be to develop better methods for comparative proteomics.

[1]  Igor N. Berezovsky,et al.  Distinct Stages of Protein Evolution as Suggested by Protein Sequence Analysis , 2001, Journal of Molecular Evolution.

[2]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[3]  Rolf Apweiler,et al.  CluSTr: a database of clusters of SWISS-PROT+TrEMBL proteins , 2001, Nucleic Acids Res..

[4]  Liisa Holm,et al.  Identification of homology in protein structure classification , 2001, Nature Structural Biology.

[5]  J. Whisstock,et al.  Protein structural alignments and functional genomics , 2001, Proteins.

[6]  M. Gerstein,et al.  Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. , 2001, Journal of molecular biology.

[7]  N. Grishin,et al.  From complete genomes to measures of substitution rate variability within and between proteins. , 2000, Genome research.

[8]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[9]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[10]  V. Uversky,et al.  Why are “natively unfolded” proteins unstructured under physiologic conditions? , 2000, Proteins.

[11]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[12]  R. Casadio,et al.  Prediction of the transmembrane regions of β‐barrel membrane proteins with a neural network‐based predictor , 2001, Protein science : a publication of the Protein Society.

[13]  George D. Rose,et al.  A protein taxonomy based on secondary structure , 1999, Nature Structural Biology.

[14]  R Apweiler,et al.  Clustering and analysis of protein families. , 2001, Current opinion in structural biology.

[15]  J Thornton,et al.  Structural genomics takes off. , 2001, Trends in biochemical sciences.

[16]  E. Koonin,et al.  Scale-free networks in biology: new insights into the fundamentals of evolution? , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[17]  J. Skolnick,et al.  Access the most recent version at doi: 10.1110/ps.49201 References , 2000 .

[18]  David C. Jones Predicting novel protein folds by using FRAGFOLD , 2001, Proteins.

[19]  C. A. Andersen,et al.  Continuum secondary structure captures protein flexibility. , 2002, Structure.

[20]  Paul W. Sternberg,et al.  WormBase: network access to the genome and biology of Caenorhabditis elegans , 2001, Nucleic Acids Res..

[21]  M. Gerstein,et al.  Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. , 2000, Genome research.

[22]  Marc A. Martí-Renom,et al.  EVA: continuous automatic evaluation of protein structure prediction servers , 2001, Bioinform..

[23]  S. Eddy,et al.  Computational identification of noncoding RNAs in E. coli by comparative genomics , 2001, Current Biology.

[24]  Golan Yona,et al.  Variations on probabilistic suffix trees: statistical modeling and prediction of protein families , 2001, Bioinform..

[25]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[26]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[27]  Andrey Rzhetsky,et al.  Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome , 2001, Bioinform..

[28]  Ian Sillitoe,et al.  Use of structure comparison methods for the refinement of protein structure predictions. I. Identifying the structural family of a protein from low‐resolution models , 2002, Proteins.

[29]  Jong H. Park,et al.  Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. , 2001, Journal of molecular biology.

[30]  James E. Bray,et al.  A rapid classification protocol for the CATH Domain Database to support structural genomics , 2001, Nucleic Acids Res..

[31]  Chris Sander,et al.  EUCLID: automatic classification of proteins in functional classes by their database annotations , 1998, Bioinform..

[32]  Frances M. G. Pearl,et al.  The CATH protein family database: A resource for structural and functional annotation of genomes , 2002, Proteomics.

[33]  M. Y. Lobanov,et al.  Search for the most stable folds of protein chains: III. Improvement in fold recognition by averaging over homologous sequences and 3D structures , 2000 .

[34]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[35]  Burkhard Rost,et al.  Target space for structural genomics revisited , 2002, Bioinform..

[36]  M Linial,et al.  Methodologies for target selection in structural genomics. , 2000, Progress in biophysics and molecular biology.

[37]  S Tsoka,et al.  Functional versatility and molecular diversity of the metabolic map of Escherichia coli. , 2001, Genome research.

[38]  K. Nakai Review: prediction of in vivo fates of proteins in the era of genomics and proteomics. , 2001, Journal of structural biology.

[39]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[40]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[41]  J M Thornton,et al.  Small-molecule metabolism: an enzyme mosaic. , 2001, Trends in biotechnology.

[42]  D Eisenberg,et al.  Selecting protein targets for structural genomics of Pyrobaculum aerophilum: validating automated fold assignment methods by using binary hypothesis testing. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[43]  T. Gaasterland,et al.  Whole-genome analysis: annotations and updates. , 2001, Current opinion in structural biology.

[44]  C. Chothia,et al.  The evolution and structural anatomy of the small molecule metabolic pathways in Escherichia coli. , 2001, Journal of molecular biology.

[45]  K. Namba Roles of partly unfolded conformations in macromolecular self‐assembly , 2001, Genes to cells : devoted to molecular & cellular mechanisms.

[46]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[47]  Zoran Obradovic,et al.  The protein trinity—linking function and disorder , 2001, Nature Biotechnology.

[48]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[49]  E V Koonin,et al.  Estimating the number of protein folds and families from complete genome data. , 2000, Journal of molecular biology.

[50]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[51]  Sarah A. Teichmann,et al.  An insight into domain combinations , 2001, ISMB.

[52]  M. Levitt,et al.  A structural census of the current population of protein sequences. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[53]  S. Brenner A tour of structural genomics , 2001, Nature Reviews Genetics.

[54]  P E Bourne,et al.  Conserved key amino acid positions (CKAAPs) derived from the analysis of common substructures in proteins , 2001, Proteins.

[55]  O. Lichtarge,et al.  Evolutionary predictions of binding surfaces and interactions. , 2002, Current opinion in structural biology.

[56]  Kei-Hoi Cheung,et al.  An integrated approach for finding overlooked genes in yeast , 2002, Nature Biotechnology.

[57]  C. Ponting,et al.  Protein repeats: structures, functions, and evolution. , 2001, Journal of structural biology.

[58]  Cathy H. Wu,et al.  iProClass: an integrated, comprehensive and annotated protein classification database , 2001, Nucleic Acids Res..

[59]  W. Wimley Toward genomic identification of β‐barrel membrane proteins: Composition and architecture of known structures , 2002, Protein science : a publication of the Protein Society.

[60]  Oliver Niggemann,et al.  Generating protein interaction maps from incomplete data: application to fold assignment , 2001, ISMB.

[61]  L. Shapiro,et al.  Finding function through structural genomics. , 2000, Current opinion in biotechnology.

[62]  Liisa Holm,et al.  Picasso: generating a covering set of protein family profiles , 2001, Bioinform..

[63]  A. Lesk,et al.  Modularity and homology: modelling of the titin type I modules and their interfaces. , 2001, Journal of molecular biology.

[64]  D Fischer,et al.  Predicting structures for genome proteins. , 1999, Current opinion in structural biology.

[65]  C. Chothia,et al.  The geometry of domain combination in proteins. , 2002, Journal of molecular biology.

[66]  Jérôme Gouzy,et al.  ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons , 2000, Nucleic Acids Res..

[67]  W G Krebs,et al.  PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information. , 2001, Nucleic acids research.

[68]  J. Skolnick,et al.  Enhanced functional annotation of protein sequences via the use of structural descriptors. , 2001, Journal of structural biology.

[69]  Cyrus Chothia,et al.  SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments , 2002, Nucleic Acids Res..

[70]  Nathan Linial,et al.  ProtoMap: automatic classification of protein sequences and hierarchy of protein families , 2000, Nucleic Acids Res..

[71]  Tim J. P. Hubbard,et al.  SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..

[72]  B. Rost,et al.  Comparing function and structure between entire proteomes , 2001, Protein science : a publication of the Protein Society.

[73]  C. Chothia,et al.  Determination of protein function, evolution and interactions by structural genomics. , 2001, Current opinion in structural biology.

[74]  John Moult,et al.  A unifold, mesofold, and superfold model of protein fold use , 2002, Proteins.

[75]  J M Thornton,et al.  From Genome to Function , 2001, Science.

[76]  A. Valencia,et al.  Intrinsic errors in genome annotation. , 2001, Trends in genetics : TIG.

[77]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[78]  L Rychlewski,et al.  Fold predictions for bacterial genomes. , 2001, Journal of structural biology.

[79]  S. Wuchty Scale-free behavior in protein domain networks. , 2001, Molecular biology and evolution.

[80]  S Brunak,et al.  On the total number of genes and their length distribution in complete microbial genomes. , 2001, Trends in genetics : TIG.

[81]  C. Ponting,et al.  On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? , 2001, Journal of structural biology.

[82]  C. Zetina,et al.  A conserved helix‐unfolding motif in the naturally unfolded proteins , 2001, Proteins.

[83]  Anton J. Enright,et al.  GeneRAGE: a robust algorithm for sequence clustering and domain detection , 2000, Bioinform..

[84]  G. Schulz β-Barrel membrane proteins , 2000 .

[85]  Richard Axel,et al.  An Olfactory Sensory Map in the Fly Brain , 2000, Cell.

[86]  M. Gerstein,et al.  Annotation transfer for genomics: measuring functional divergence in multi-domain proteins. , 2001, Genome research.

[87]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[88]  B. Rost,et al.  Loopy proteins appear conserved in evolution. , 2002, Journal of molecular biology.

[89]  M. Gerstein,et al.  Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome. , 2001, Nucleic acids research.