Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer

BackgroundCollections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs). Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea.ResultsThe updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether) into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major ‘highways’ of horizontal gene transfer.ConclusionsThe updated collection of arCOGs is expected to become a key resource for comparative genomics, evolutionary reconstruction and functional annotation of new archaeal genomes. Given that, in spite of the major increase in the number of genomes, the conserved core of archaeal genes appears to be stabilizing, the major evolutionary trends revealed here have a chance to stand the test of time.ReviewersThis article was reviewed by (for complete reviews see the Reviewers’ Reports section): Dr. PLG, Prof. PF, Dr. PL (nominated by Prof. JPG).

[1]  J. Gatesy,et al.  The supermatrix approach to systematics. , 2007, Trends in ecology & evolution.

[2]  Eugene V Koonin,et al.  Comparative genomics of Thermus thermophilus and Deinococcus radiodurans: divergent routes of adaptation to thermophily and radiation resistance , 2005, BMC Evolutionary Biology.

[3]  Dirk Erpenbeck,et al.  OrthoSelect: a protocol for selecting orthologous groups in phylogenomics , 2009, BMC Bioinformatics.

[4]  Timothy J. Harlow,et al.  Highways of gene sharing in prokaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  E. Koonin,et al.  A korarchaeal genome reveals insights into the evolution of the Archaea , 2008, Proceedings of the National Academy of Sciences.

[6]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[7]  E. Koonin,et al.  Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world , 2008, Nucleic acids research.

[8]  Michael Y. Galperin,et al.  Who's your neighbor? New computational approaches for functional genomics , 2000, Nature Biotechnology.

[9]  E. Koonin The Logic of Chance: The Nature and Origin of Biological Evolution , 2011 .

[10]  E. Koonin,et al.  Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea , 2007, Biology Direct.

[11]  P. Forterre,et al.  The origin of eukaryotes and their relationship with the Archaea: are we at a phylogenomic impasse? , 2010, Nature Reviews Microbiology.

[12]  G. Glazko,et al.  Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns , 2004, Genome Biology.

[13]  Michael Y. Galperin,et al.  The cyanobacterial genome core and the origin of photosynthesis , 2006, Proceedings of the National Academy of Sciences.

[14]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[15]  B. Snel,et al.  STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. , 2000, Nucleic acids research.

[16]  Eugene V Koonin,et al.  Connected gene neighborhoods in prokaryotic genomes. , 2002, Nucleic acids research.

[17]  E V Koonin,et al.  A phylogenetic approach to target selection for structural genomics: solution structure of YciH. , 1999, Nucleic acids research.

[18]  Robert L Charlebois,et al.  Chlamydia: 780.57 (sd = 1.81), range 778–784, n =7 Cyanobacteria: 820.50 (sd = 23.53), range 776–844, n =8 , 2022 .

[19]  B. Golinelli‐Pimpaneau,et al.  Acquisition of a bacterial RumA‐type tRNA(uracil‐54, C5)‐methyltransferase by Archaea through an ancient horizontal gene transfer , 2007, Molecular microbiology.

[20]  P. Forterre Darwin's goldmine is still open: variation and selection run the world , 2012, Front. Cell. Inf. Microbio..

[21]  Miklós Csuös,et al.  Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood , 2010, Bioinform..

[22]  Natalya Yutin,et al.  Phylogenomics of prokaryotic ribosomal proteins , 2011, Genome Biology.

[23]  István Miklós,et al.  Streamlining and Large Ancestral Genomes in Archaea Inferred with a Phylogenetic Birth-and-Death Model , 2009, Molecular biology and evolution.

[24]  P. Forterre,et al.  Phylogeny and evolution of the Archaea: one hundred genomes later. , 2011, Current opinion in microbiology.

[25]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[26]  M. Gouy,et al.  Parallel adaptations to high temperatures in the Archaean eon , 2008, Nature.

[27]  Christophe Dessimoz,et al.  Inferring orthology and paralogy. , 2012, Methods in molecular biology.

[28]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[29]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[30]  A. Spang,et al.  Distinct gene set in two different lineages of ammonia-oxidizing archaea supports the phylum Thaumarchaeota. , 2010, Trends in microbiology.

[31]  Erik L. L. Sonnhammer,et al.  Automated ortholog inference from phylogenetic trees and calculation of orthology reliability , 2002, Bioinform..

[32]  Tal Pupko,et al.  Inference of Gain and Loss Events from Phyletic Patterns Using Stochastic Mapping and Maximum Parsimony—A Simulation Study , 2011, Genome biology and evolution.

[33]  Charles Weijer,et al.  Full House: The Spread of Excellence from Plato to Darwin. , 1997 .

[34]  Berend Snel,et al.  Orthology prediction at scalable resolution by phylogenetic tree analysis , 2007, BMC Bioinformatics.

[35]  S. Pongor,et al.  The quest for orthologs: finding the corresponding gene across genomes. , 2008, Trends in genetics : TIG.

[36]  D. Segrè,et al.  Modeling the complex dynamics of enzyme-pathway coevolution. , 2010, Chaos.

[37]  P. Forterre Thermoreduction, a hypothesis for the origin of prokaryotes. , 1995, Comptes rendus de l'Academie des sciences. Serie III, Sciences de la vie.

[38]  P. Bork,et al.  Orthology prediction methods: A quality assessment using curated protein families , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[39]  Damian Szklarczyk,et al.  eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges , 2011, Nucleic Acids Res..

[40]  Patricia P. Chan,et al.  Functional curation of the Sulfolobus solfataricus P2 and S. acidocaldarius 98-3 complete genome sequences , 2011, Extremophiles.

[41]  K. Makarova,et al.  The Complete Genome Sequence of Thermoproteus tenax: A Physiologically Versatile Member of the Crenarchaeota , 2011, PloS one.

[42]  P. Forterre,et al.  Spotlight on the Thaumarchaeota , 2011, The ISME Journal.

[43]  Thijs J. G. Ettema,et al.  The archaeal 'TACK' superphylum and the origin of eukaryotes. , 2011, Trends in microbiology.

[44]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[45]  Christophe Dessimoz,et al.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods , 2009, PLoS Comput. Biol..

[46]  M. Lynch Streamlining and simplification of microbial genome architecture. , 2006, Annual review of microbiology.

[47]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[48]  Tomasello,et al.  A congruent phylogenomic signal places eukaryotes within the Archaea , 2012, Proceedings of the Royal Society B: Biological Sciences.

[49]  Jun Zhu,et al.  Characterization and Inference of Gene Gain/Loss Along Burkholderia Evolutionary History , 2011, Evolutionary bioinformatics online.

[50]  Adam P. Arkin,et al.  FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix , 2009, Molecular biology and evolution.

[51]  Michael Y. Galperin,et al.  Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes , 2003, BMC Evolutionary Biology.

[52]  E. Koonin,et al.  Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. , 2001, Genome research.

[53]  P. Forterre,et al.  Mesophilic crenarchaeota: proposal for a third archaeal phylum, the Thaumarchaeota , 2008, Nature Reviews Microbiology.

[54]  Scott W Roy,et al.  Intron-rich ancestors. , 2006, Trends in genetics : TIG.

[55]  C. Ouzounis,et al.  The balance of driving forces during genome evolution in prokaryotes. , 2003, Genome research.

[56]  Wankyu Kim,et al.  Dissection of the dimerization modes in the DJ-1 superfamily , 2012, Molecules and cells.

[57]  P. Forterre,et al.  Nanoarchaea: representatives of a novel archaeal phylum or a fast-evolving euryarchaeal lineage related to Thermococcales? , 2005, Genome Biology.

[58]  E. Koonin,et al.  Comparative genomics of archaea: how much have we learned in six years, and what's next? , 2003, Genome Biology.

[59]  B. Snel,et al.  Genomes in flux: the evolution of archaeal and proteobacterial gene content. , 2002, Genome research.

[60]  Arcady R. Mushegian,et al.  Computational methods for Gene Orthology inference , 2011, Briefings Bioinform..

[61]  D. M. Krylov,et al.  Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. , 2003, Genome research.

[62]  M. Lynch,et al.  The altered evolutionary trajectories of gene duplicates. , 2004, Trends in genetics : TIG.

[63]  Sean R. Eddy,et al.  RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs , 2002, BMC Bioinformatics.

[64]  E. Braun Innovation from reduction: gene loss, domain loss and sequence divergence in genome evolution. , 2003, Applied bioinformatics.

[65]  H. Linder,et al.  A novel supermatrix approach improves resolution of phylogenetic relationships in a comprehensive sample of danthonioid grasses. , 2008, Molecular phylogenetics and evolution.

[66]  Christian von Mering,et al.  eggNOG: automated construction and annotation of orthologous groups of genes , 2007, Nucleic Acids Res..

[67]  George N. Bennett,et al.  Genome Sequence and Comparative Analysis of the Solvent-Producing Bacterium Clostridium acetobutylicum , 2001, Journal of bacteriology.

[68]  Katherine H. Huang,et al.  Comparative genomics of the lactic acid bacteria , 2006, Proceedings of the National Academy of Sciences.

[69]  Dmitrij Frishman,et al.  The genome sequence of the thermoacidophilic scavenger Thermoplasma acidophilum , 2000, Nature.

[70]  J. Degnan,et al.  Fast and consistent estimation of species trees using supermatrix rooted triples. , 2010, Molecular biology and evolution.

[71]  S. Gribaldo,et al.  Time for order in microbial systematics. , 2012, Trends in microbiology.

[72]  Natalia N. Ivanova,et al.  A genomic analysis of the archaeal system Ignicoccus hospitalis-Nanoarchaeum equitans , 2008, Genome Biology.

[73]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[74]  Thérèse A. Holton,et al.  Deep Genomic-Scale Analyses of the Metazoa Reject Coelomata: Evidence from Single- and Multigene Families Analyzed Under a Supertree and Supermatrix Paradigm , 2010, Genome biology and evolution.

[75]  Igor B. Rogozin,et al.  A Detailed History of Intron-rich Eukaryotic Ancestors Inferred from a Global Survey of 100 Complete Genomes , 2011, PLoS Comput. Biol..