A structural perspective on genome evolution.

Protein translations of over 100 complete genomes are now available. About half of these sequences can be provided with structural annotation, thereby enabling some profound insights into protein and pathway evolution. Whereas the major domain structure families are common to all kingdoms of life, these are combined in different ways in multidomain proteins to give various domain architectures that are specific to kingdoms or individual genomes, and contribute to the diverse phenotypes observed. These data argue for more targets in structural genomics initiatives and particularly for the selection of different domain architectures to gain better insights into protein functions.

[1]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[2]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[3]  Burkhard Rost,et al.  Did evolution leap to create the protein universe? , 2002, Current opinion in structural biology.

[4]  Tim J. P. Hubbard,et al.  Biological information: making it accessible and integrated (and trying to make sense of it) , 2002, ECCB.

[5]  Cyrus Chothia,et al.  SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments , 2002, Nucleic Acids Res..

[6]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Jérôme Gouzy,et al.  ProDom: Automated Clustering of Homologous Domains , 2002, Briefings Bioinform..

[8]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[9]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[10]  Liisa Holm,et al.  Identification of homology in protein structure classification , 2001, Nature Structural Biology.

[11]  N. Grishin Fold change in evolution of protein structures. , 2001, Journal of structural biology.

[12]  Michael Lappe,et al.  A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3 , 2001, Nucleic Acids Res..

[13]  S Tsoka,et al.  Functional versatility and molecular diversity of the metabolic map of Escherichia coli. , 2001, Genome research.

[14]  E. Lindahl,et al.  Identification of related proteins on family, superfamily and fold level. , 2000, Journal of molecular biology.

[15]  L. Holm,et al.  Exhaustive enumeration of protein domain families. , 2003, Journal of molecular biology.

[16]  Michael J E Sternberg,et al.  Evolution of enzymes in metabolism: a network perspective. , 2002, Journal of molecular biology.

[17]  James E. Bray,et al.  The CATH database: an extended protein family resource for structural and functional genomics , 2003, Nucleic Acids Res..

[18]  J M Thornton,et al.  Small-molecule metabolism: an enzyme mosaic. , 2001, Trends in biotechnology.

[19]  Dong Xu,et al.  Improving the performance of DomainParser for structural domain partition using neural network. , 2003, Nucleic acids research.

[20]  A. Panchenko,et al.  Combination of threading potentials and sequence profiles improves fold recognition. , 2000, Journal of molecular biology.

[21]  Frances M. G. Pearl,et al.  Quantifying the similarities within fold space. , 2002, Journal of molecular biology.

[22]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[23]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[24]  M. Gerstein,et al.  The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties , 2002, Genome Biology.

[25]  L Rychlewski,et al.  Fold predictions for bacterial genomes. , 2001, Journal of structural biology.

[26]  Zukang Feng,et al.  The Protein Data Bank and structural genomics , 2003, Nucleic Acids Res..

[27]  K Karplus,et al.  What is the value added by human intervention in protein structure prediction? , 2001, Proteins.

[28]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[29]  M J Sippl,et al.  Assessment of the CASP4 fold recognition category , 2001, Proteins.

[30]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[31]  M. Madera,et al.  A comparison of profile hidden Markov model procedures for remote homology detection. , 2002, Nucleic acids research.

[32]  Nathan Linial,et al.  ProtoMap: automatic classification of protein sequences and hierarchy of protein families , 2000, Nucleic Acids Res..

[33]  Tim J. P. Hubbard,et al.  SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..

[34]  Frances M. G. Pearl,et al.  Review: what can structural classifications reveal about protein evolution? , 2001, Journal of structural biology.

[35]  Burkhard Rost,et al.  Target space for structural genomics revisited , 2002, Bioinform..

[36]  Charles DeLisi,et al.  Functional fingerprints of folds: evidence for correlated structure-function evolution. , 2003, Journal of molecular biology.

[37]  B. Rost,et al.  Comparing function and structure between entire proteomes , 2001, Protein science : a publication of the Protein Society.

[38]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[39]  C. Orengo,et al.  Plasticity of enzyme active sites. , 2002, Trends in biochemical sciences.

[40]  C. Chothia,et al.  The geometry of domain combination in proteins. , 2002, Journal of molecular biology.

[41]  Martin Vingron,et al.  The SYSTERS protein sequence cluster set , 2000, Nucleic Acids Res..

[42]  C. Chothia,et al.  Determination of protein function, evolution and interactions by structural genomics. , 2001, Current opinion in structural biology.

[43]  A. Sali,et al.  Protein structure modeling for structural genomics , 2000, Nature Structural Biology.

[44]  James E. Bray,et al.  Assigning genomic sequences to CATH , 2000, Nucleic Acids Res..

[45]  Janet M Thornton,et al.  Sequence and structural differences between enzyme and nonenzyme homologs. , 2002, Structure.

[46]  A. Elofsson,et al.  Hidden Markov models that use predicted secondary structures for fold recognition , 1999, Proteins.

[47]  E. Shakhnovich,et al.  Understanding hierarchical protein evolution from first principles. , 2001, Journal of molecular biology.

[48]  J. Thornton,et al.  Homology, pathway distance and chromosomal localization of the small molecule metabolism enzymes in Escherichia coli. , 2002, Journal of molecular biology.

[49]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[50]  Jiye Shi,et al.  HOMSTRAD: adding sequence information to structure-based alignments of homologous protein families , 2001, Bioinform..

[51]  M. Huynen,et al.  The frequency distribution of gene family sizes in complete genomes. , 1998, Molecular biology and evolution.

[52]  John Moult,et al.  A unifold, mesofold, and superfold model of protein fold use , 2002, Proteins.

[53]  M. Gerstein,et al.  Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. , 2001, Journal of molecular biology.

[54]  E. Koonin,et al.  Emergence of diverse biochemical activities in evolutionarily conserved structural scaffolds of proteins. , 2003, Current opinion in chemical biology.

[55]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[56]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[57]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[58]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[59]  M C Peitsch,et al.  Protein modelling for all. , 1999, Trends in biochemical sciences.

[60]  S. Teichmann,et al.  Evolution of transcription factors and the gene regulatory network in Escherichia coli. , 2003, Nucleic acids research.

[61]  E V Koonin,et al.  Estimating the number of protein folds and families from complete genome data. , 2000, Journal of molecular biology.

[62]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[63]  Cyrus Chothia,et al.  Comparison of the small molecule metabolic enzymes of Escherichia coli and Saccharomyces cerevisiae. , 2002, Genome research.

[64]  Frances M. G. Pearl,et al.  Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database. , 2002, Genome research.

[65]  John B. Anderson,et al.  CDD: a curated Entrez database of conserved domain alignments , 2003, Nucleic Acids Res..

[66]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.