Prokaryotic phylogenies inferred from protein structural domains.

The determination of the phylogenetic relationships among microorganisms has long relied primarily on gene sequence information. Given that prokaryotic organisms often lack morphological characteristics amenable to phylogenetic analysis, prokaryotic phylogenies, in particular, are often based on sequence data. In this work, we explore a new source of phylogenetic information, the distribution of protein structural domains within fully sequenced prokaryotic genomes. The evolution of the structural domains we use has been studied extensively, allowing us to base our phylogenetic methods on testable theoretical models of structural evolution. We find that the methods that produce reasonable phylogenetic relationships are indeed the methods that are most consistent with theoretical evolutionary models. This work represents, to our knowledge, the first such theoretically motivated phylogeny, as well as the first application of structural information to phylogeny on this scale. Our results have strong implications for the phylogenetic relationships among prokaryotic organisms and for the understanding of protein evolution as a whole.

[1]  N. Pace,et al.  Perspectives on archaeal diversity, thermophily and monophyly from environmental rRNA sequences. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[2]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[3]  E V Koonin,et al.  Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. , 1998, Trends in genetics : TIG.

[4]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[5]  J. Lobry,et al.  Relationships Between Genomic G+C Content, RNA Secondary Structures, and Optimal Growth Temperature in Prokaryotes , 1997, Journal of Molecular Evolution.

[6]  S. Gribaldo,et al.  Phylogenetic Depth of the Bacterial Genera Aquifex and Thermotoga Inferred from Analysis of Ribosomal Protein, Elongation Factor, and RNA Polymerase Subunit Sequences , 2000, Journal of Molecular Evolution.

[7]  W. Doolittle,et al.  Prokaryotic evolution in light of gene transfer. , 2002, Molecular biology and evolution.

[8]  Gonzalo Giribet,et al.  Current advances in the phylogenetic reconstruction of metazoan evolution. A new paradigm for the Cambrian explosion? , 2002, Molecular phylogenetics and evolution.

[9]  Bas E. Dutilh,et al.  The Consistent Phylogenetic Signal in Genome Trees Revealed by Reducing the Impact of Noise , 2004, Journal of Molecular Evolution.

[10]  Charles DeLisi,et al.  ELISA: Structure-Function Inferences based on statistically significant and evolutionarily inspired observations , 2003, BMC Bioinformatics.

[11]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[12]  H. Ochman,et al.  Lateral gene transfer and the nature of bacterial innovation , 2000, Nature.

[13]  S E Brenner,et al.  Distribution of protein folds in the three superkingdoms of life. , 1999, Genome research.

[14]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[15]  M. Gerstein,et al.  Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. , 2000, Genome research.

[16]  W. Doolittle,et al.  Phylogenetic analyses of two "archaeal" genes in thermotoga maritima reveal multiple transfers between archaea and bacteria. , 2001, Molecular biology and evolution.

[17]  Eric J. Deeds,et al.  Protein evolution within a structural space. , 2003, Biophysical journal.

[18]  Michael Y. Galperin,et al.  Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes , 2003, BMC Evolutionary Biology.

[19]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[20]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[21]  B. Snel,et al.  SHOT: a web server for the construction of genome phylogenies. , 2002, Trends in genetics : TIG.

[22]  Eric J. Deeds,et al.  Proteomic traces of speciation. , 2004, Journal of molecular biology.

[23]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[24]  H. Philippe,et al.  Ancient phylogenetic relationships. , 2002, Theoretical population biology.

[25]  Michael J. Stanhope,et al.  Universal trees based on large combined protein sequence data sets , 2001, Nature Genetics.

[26]  Jian Wang,et al.  A complete sequence of the T. tengcongensis genome. , 2002, Genome research.

[27]  Chris Sander,et al.  The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..

[28]  S. Fitz-Gibbon,et al.  Whole genome-based phylogenetic analysis of free-living microorganisms. , 1999, Nucleic acids research.

[29]  James R. Cole,et al.  The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy , 2003, Nucleic Acids Res..

[30]  N. Grishin,et al.  Genome trees and the tree of life. , 2002, Trends in genetics : TIG.

[31]  Liisa Holm,et al.  Identification of homology in protein structure classification , 2001, Nature Structural Biology.

[32]  C DeLisi,et al.  Estimating the number of protein folds. , 1998, Journal of molecular biology.

[33]  Andrej Sali Target practice , 2001, Nature Structural Biology.

[34]  N. Grishin,et al.  Genome trees constructed using five different approaches suggest new major bacterial clades , 2001, BMC Evolutionary Biology.

[35]  J. Farris Phylogenetic Analysis Under Dollo's Law , 1977 .

[36]  C. Chothia,et al.  Evolution of the Protein Repertoire , 2003, Science.

[37]  E V Koonin,et al.  Estimating the number of protein folds and families from complete genome data. , 2000, Journal of molecular biology.

[38]  Eugene I Shakhnovich,et al.  Expanding protein universe and its origin from the biological Big Bang , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[39]  S. Fitz-Gibbon,et al.  Using Homolog Groups to Create a Whole-Genomic Tree of Free-Living Organisms: An Update , 2002, Journal of Molecular Evolution.