A phylogenomic reconstruction of the protein world based on a genomic census of protein fold architecture

The protein world has a hierarchical and redundant organization that can be specified in terms of evolutionary units of molecular structure, the protein domains. The Structural Classification of Proteins (SCOP) has unified domains into a comparatively small set of folding architectures, the protein fold families and superfamilies, and these have been further grouped into protein folds. In this study, we reconstruct the evolution of the protein world using information embedded in a structural genomic census of fold architectures defined by a phylogenomic analysis of 185 completely sequenced genomes using advanced hidden Markov models and 776 folds described in SCOP release 1.67. Our study confirms the existence of defined evolutionary patterns of architectural diversification and explores how phylogenomic trees generated from folds relate to those reconstructed from fold superfamilies. Evolutionary patterns help us propose a general conceptual model that describes the growth of architectures in the protein world. © 2006 Wiley Periodicals, Inc. Complexity 12: 27– 40, 2006

[1]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[2]  C. Orengo,et al.  Plasticity of enzyme active sites. , 2002, Trends in biochemical sciences.

[3]  C. Chothia,et al.  Evolution of the Protein Repertoire , 2003, Science.

[4]  Samuel Karlin,et al.  Protein length in eukaryotic and prokaryotic proteomes , 2005, Nucleic acids research.

[5]  O. Nureki,et al.  Crystal structure of archaeosine tRNA-guanine transglycosylase. , 2002, Journal of molecular biology.

[6]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[7]  M. Gerstein,et al.  Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. , 2000, Genome research.

[8]  Michael G. Rossmann,et al.  Chemical and biological evolution of a nucleotide-binding protein , 1974, Nature.

[9]  Charlotte M. Deane,et al.  How old is your fold? , 2005, ISMB.

[10]  P. Bork,et al.  Homology among (betaalpha)(8) barrels: implications for the evolution of metabolic pathways. , 2000, Journal of molecular biology.

[11]  R. Doolittle,et al.  Evolutionary aspects of whole-genome biology. , 2005, Current opinion in structural biology.

[12]  C. Orengo,et al.  One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. , 2002, Journal of molecular biology.

[13]  Gustavo Caetano-Anollés,et al.  Universal Sharing Patterns in Proteomes and Evolution of Protein Fold Architecture and Life , 2005, Journal of Molecular Evolution.

[14]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[15]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[16]  E. Koonin,et al.  Trends in protein evolution inferred from sequence and structure analysis. , 2002, Current opinion in structural biology.

[17]  J. Thornton,et al.  Understanding nature's catalytic toolkit. , 2005, Trends in biochemical sciences.

[18]  R M May,et al.  Extinction rates can be estimated from molecular phylogenies. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[19]  W. Fontana,et al.  Plasticity, evolvability, and modularity in RNA. , 2000, The Journal of experimental zoology.

[20]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[21]  Gustavo Caetano-Anollés,et al.  An evolutionarily structured universe of protein architecture. , 2003, Genome research.

[22]  Russell L. Marsden,et al.  Progress of structural genomics initiatives: an analysis of solved target structures. , 2005, Journal of molecular biology.

[23]  T. E. Harris,et al.  The Theory of Branching Processes. , 1963 .

[24]  P. Bork,et al.  Homology among (βα) 8 barrels: implications for the evolution of metabolic pathways 1 1Edited by G. Von Heijne , 2000 .

[25]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[26]  Dan S. Tawfik,et al.  Conformational diversity and protein evolution--a 60-year-old hypothesis revisited. , 2003, Trends in biochemical sciences.

[27]  R. Doolittle,et al.  Phylogeny determined by protein domain content. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Julian Gough,et al.  Convergent evolution of domain architectures (is rare) , 2005, Bioinform..

[29]  Eric J. Deeds,et al.  Prokaryotic phylogenies inferred from protein structural domains. , 2005, Genome research.

[30]  S E Brenner,et al.  Distribution of protein folds in the three superkingdoms of life. , 1999, Genome research.

[31]  Wayne P. Maddison,et al.  Macclade: Analysis of Phylogeny and Character Evolution/Version 3 , 1992 .

[32]  D. Maddison,et al.  MacClade 4: analysis of phy-logeny and character evolution , 2003 .

[33]  A. Bull,et al.  Biodiversity as a source of innovation in biotechnology. , 1992, Annual review of microbiology.

[34]  J. Söding,et al.  More than the sum of their parts: On the evolution of proteins from peptides , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[35]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2004: additions and improvements , 2004, Nucleic Acids Res..

[36]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[37]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[38]  C. Ouzounis,et al.  The balance of driving forces during genome evolution in prokaryotes. , 2003, Genome research.

[39]  M. Gerstein Patterns of protein‐fold usage in eight microbial genomes: A comprehensive structural census , 1998, Proteins.