Archaeal Clusters of Orthologous Genes (arCOGs): An Update and Application for Analysis of Shared Features between Thermococcales, Methanococcales, and Methanobacteriales

With the continuously accelerating genome sequencing from diverse groups of archaea and bacteria, accurate identification of gene orthology and availability of readily expandable clusters of orthologous genes are essential for the functional annotation of new genomes. We report an update of the collection of archaeal Clusters of Orthologous Genes (arCOGs) to cover, on average, 91% of the protein-coding genes in 168 archaeal genomes. The new arCOGs were constructed using refined algorithms for orthology identification combined with extensive manual curation, including incorporation of the results of several completed and ongoing research projects in archaeal genomics. A new level of classification is introduced, superclusters that unit two or more arCOGs and more completely reflect gene family evolution than individual, disconnected arCOGs. Assessment of the current archaeal genome annotation in public databases indicates that consistent use of arCOGs can significantly improve the annotation quality. In addition to their utility for genome annotation, arCOGs also are a platform for phylogenomic analysis. We explore this aspect of arCOGs by performing a phylogenomic study of the Thermococci that are traditionally viewed as the basal branch of the Euryarchaeota. The results of phylogenomic analysis that involved both comparison of multiple phylogenetic trees and a search for putative derived shared characters by using phyletic patterns extracted from the arCOGs reveal a likely evolutionary relationship between the Thermococci, Methanococci, and Methanobacteria. The arCOGs are expected to be instrumental for a comprehensive phylogenomic study of the archaea.

[1]  B. Snel,et al.  Toward Automatic Reconstruction of a Highly Resolved Tree of Life , 2006, Science.

[2]  Natalia N. Ivanova,et al.  Insights into the phylogeny and coding potential of microbial dark matter , 2013, Nature.

[3]  Céline Brochier,et al.  An emerging phylogenetic core of Archaea: phylogenies of transcription and translation machineries converge following addition of new genome sequences , 2005, BMC Evolutionary Biology.

[4]  J. Einasto Dark Matter , 2009, 0901.0632.

[5]  Arcady R. Mushegian,et al.  Computational methods for Gene Orthology inference , 2011, Briefings Bioinform..

[6]  Hervé Philippe,et al.  Archaeal phylogeny based on ribosomal proteins. , 2002, Molecular biology and evolution.

[7]  P. Forterre,et al.  Phylogeny and evolution of the Archaea: one hundred genomes later. , 2011, Current opinion in microbiology.

[8]  Doug Hyatt,et al.  Enigmatic, ultrasmall, uncultivated Archaea , 2010, Proceedings of the National Academy of Sciences.

[9]  Natalya Yutin,et al.  Phylogenomics of prokaryotic ribosomal proteins , 2011, Genome Biology.

[10]  Claire O'Donovan,et al.  Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data , 2014, Database J. Biol. Databases Curation.

[11]  E. Koonin,et al.  Two new families of the FtsZ-tubulin protein superfamily implicated in membrane remodeling in diverse bacteria and archaea , 2010, Biology Direct.

[12]  Sean D. Hooper,et al.  Genomic Characterization of Methanomicrobiales Reveals Three Classes of Methanogens , 2009, PloS one.

[13]  L. Knowles,et al.  How low can you go? The effects of mutation rate on the accuracy of species-tree estimation. , 2014, Molecular phylogenetics and evolution.

[14]  R. Lewis,et al.  RNA degradation in Bacillus subtilis: an interplay of essential endo‐ and exoribonucleases , 2012, Molecular microbiology.

[15]  István Miklós,et al.  Streamlining and Large Ancestral Genomes in Archaea Inferred with a Phylogenetic Birth-and-Death Model , 2009, Molecular biology and evolution.

[16]  A. Force,et al.  The probability of duplicate gene preservation by subfunctionalization. , 2000, Genetics.

[17]  Dieter Söll,et al.  The genome of Nanoarchaeum equitans: Insights into early archaeal evolution and derived parasitism , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Sagi Snir,et al.  Defense Islands in Bacterial and Archaeal Genomes and Prediction of Novel Defense Systems , 2011, Journal of bacteriology.

[19]  E. Koonin,et al.  A korarchaeal genome reveals insights into the evolution of the Archaea , 2008, Proceedings of the National Academy of Sciences.

[20]  N. Loman,et al.  Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach , 2011, PloS one.

[21]  Michael Y. Galperin,et al.  Comparative genomic analysis of evolutionarily conserved but functionally uncharacterized membrane proteins in archaea: Prediction of novel components of secretion, membrane remodeling and glycosylation systems , 2015, Biochimie.

[22]  Eugene V Koonin,et al.  The CMG (CDC45/RecJ, MCM, GINS) complex is a conserved component of the DNA replication system in all archaea and eukaryotes , 2012, Biology Direct.

[23]  Tsutomu Suzuki,et al.  molecular mechanism of lysidine synthesis that determines tRNA identity and codon recognition. , 2005, Molecular cell.

[24]  Maureen A. O’Malley,et al.  Prokaryotic evolution and the tree of life are two different things , 2009, Biology Direct.

[25]  E. Koonin,et al.  Insights into archaeal evolution and symbiosis from the genomes of a nanoarchaeon and its inferred crenarchaeal host from Obsidian Pool, Yellowstone National Park , 2013, Biology Direct.

[26]  Haiwei Luo,et al.  Gene Order Phylogeny and the Evolution of Methanogens , 2009, PloS one.

[27]  Katsuhiko Murakami,et al.  Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees , 2007, Nucleic Acids Res..

[28]  K. Makarova,et al.  The Complete Genome Sequence of Thermoproteus tenax: A Physiologically Versatile Member of the Crenarchaeota , 2011, PloS one.

[29]  Thijs J. G. Ettema,et al.  The archaeal 'TACK' superphylum and the origin of eukaryotes. , 2011, Trends in microbiology.

[30]  Yue-qin Tang,et al.  E. coli aconitase B structure reveals a HEAT-like domain with implications for protein–protein recognition , 2002, Nature Structural Biology.

[31]  Rolf Bernander,et al.  Archaeal Signal Transduction: Impact of Protein Phosphatase Deletions on Cell Size, Motility, and Energy Metabolism in Sulfolobus acidocaldarius* , 2013, Molecular & Cellular Proteomics.

[32]  Boris G. Mirkin,et al.  Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell , 2005, Nucleic acids research.

[33]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[34]  Andrew M. Smith,et al.  The UCSC Archaeal Genome Browser: 2012 update , 2011, Nucleic Acids Res..

[35]  E. Koonin,et al.  Search for a 'Tree of Life' in the thicket of the phylogenetic forest , 2009, Journal of biology.

[36]  P. Bork,et al.  A P‐loop‐like motif in a widespread ATP pyrophosphatase domain: Implications for the evolution of sequence motifs and enzyme activity , 1994, Proteins.

[37]  Eugene V Koonin,et al.  CRISPR-Cas , 2013, RNA biology.

[38]  E. Koonin Carl Woese's vision of cellular evolution and the domains of life , 2014, RNA biology.

[39]  P. Wright,et al.  Change of carbon source causes dramatic effects in the phospho-proteome of the archaeon Sulfolobus solfataricus. , 2012, Journal of proteome research.

[40]  Eric Bapteste,et al.  INAUGURAL ARTICLE by a Recently Elected Academy Member:Pattern pluralism and the Tree of Life hypothesis , 2007 .

[41]  Maryse Condé Tree of Life , 1992 .

[42]  Peer Bork,et al.  A Phylogeny-Based Benchmarking Test for Orthology Inference Reveals the Limitations of Function-Based Validation , 2014, PloS one.

[43]  W. Doolittle,et al.  Alternative methods for concatenation of core genes indicate a lack of resolution in deep nodes of the prokaryotic phylogeny. , 2007, Molecular biology and evolution.

[44]  Tomasello,et al.  A congruent phylogenomic signal places eukaryotes within the Archaea , 2012, Proceedings of the Royal Society B: Biological Sciences.

[45]  E. Koonin,et al.  Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea , 2007, Biology Direct.

[46]  Natalia N. Ivanova,et al.  A genomic analysis of the archaeal system Ignicoccus hospitalis-Nanoarchaeum equitans , 2008, Genome Biology.

[47]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[48]  Daniel N. Wilson,et al.  Proteomic characterization of archaeal ribosomes reveals the presence of novel archaeal-specific ribosomal proteins. , 2011, Journal of molecular biology.

[49]  Harald Huber,et al.  A new phylum of Archaea represented by a nanosized hyperthermophilic symbiont , 2002, Nature.

[50]  E. Koonin,et al.  Seeing the Tree of Life behind the phylogenetic forest , 2013, BMC Biology.

[51]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[52]  Eugene V Koonin,et al.  The basic building blocks and evolution of CRISPR-CAS systems. , 2013, Biochemical Society transactions.

[53]  Rolf Bernander,et al.  A unique cell division machinery in the Archaea , 2008, Proceedings of the National Academy of Sciences.

[54]  N. Kyrpides,et al.  Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach , 2009, Nucleic acids research.

[55]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[56]  Alexander Goncearenco,et al.  Exploring the evolution of protein function in Archaea , 2012, BMC Evolutionary Biology.

[57]  E. Koonin,et al.  GINS, a central nexus in the archaeal DNA replication fork , 2006, EMBO reports.

[58]  William J. Kelly,et al.  The Genome Sequence of the Rumen Methanogen Methanobrevibacter ruminantium Reveals New Possibilities for Controlling Ruminant Methane Emissions , 2010, PloS one.

[59]  Darren A. Natale,et al.  The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[60]  E. Koonin,et al.  Dark matter in archaeal genomes: a rich source of novel mobile elements, defense systems and secretory complexes , 2014, Extremophiles.

[61]  Natalya Yutin,et al.  Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer , 2012, Biology Direct.

[62]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[63]  Narmada Thanki,et al.  CDD: specific functional annotation with the Conserved Domain Database , 2008, Nucleic Acids Res..

[64]  E. Koonin,et al.  The Deep Archaeal Roots of Eukaryotes , 2008, Molecular biology and evolution.

[65]  E V Koonin,et al.  Evolution of aminoacyl-tRNA synthetases--analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. , 1999, Genome research.

[66]  Céline Brochier,et al.  Archaeal phylogeny based on proteins of the transcription and translation machineries: tackling the Methanopyrus kandleri paradox , 2004, Genome Biology.

[67]  Christophe Dessimoz,et al.  Inferring orthology and paralogy. , 2012, Methods in molecular biology.

[68]  P. Forterre,et al.  Global Phylogenomic Analysis Disentangles the Complex Evolutionary History of DNA Replication in Archaea , 2014, Genome biology and evolution.

[69]  Fan Yang,et al.  TIGRFAMs: a protein family resource for the functional identification of proteins , 2001, Nucleic Acids Res..

[70]  S. Gribaldo,et al.  Comparative genomics highlights the unique biology of Methanomassiliicoccales, a Thermoplasmatales-related seventh order of methanogenic archaea that encodes pyrrolysine , 2014, BMC Genomics.

[71]  Purificación López-García,et al.  Rooting the Domain Archaea by Phylogenomic Analysis Supports the Foundation of the New Kingdom Proteoarchaeota , 2014, Genome biology and evolution.

[72]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[73]  V. Kaberdin,et al.  Unraveling new roles for minor components of the E. coli RNA degradosome , 2009, RNA biology.

[74]  Michael Y. Galperin,et al.  Expanded microbial genome coverage and improved protein family annotation in the COG database , 2014, Nucleic Acids Res..

[75]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[76]  R. DeSalle,et al.  Phylogeny of genes for secretion NTPases: Identification of the widespread tadA subfamily and development of a diagnostic key for gene classification , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[77]  P. Lockhart,et al.  A reality check for alignments and trees. , 2007, Trends in genetics : TIG.

[78]  P. Forterre,et al.  Nanoarchaea: representatives of a novel archaeal phylum or a fast-evolving euryarchaeal lineage related to Thermococcales? , 2005, Genome Biology.

[79]  M. Paddy,et al.  NCL1, a novel gene for a non-essential nuclear protein in Saccharomyces cerevisiae. , 1998, Gene.

[80]  Antoine Danchin,et al.  Re-annotation of genome microbial CoDing-Sequences: finding new genes and inaccurately annotated genes , 2002, BMC Bioinformatics.

[81]  F. Lapointe,et al.  Of woods and webs: possible alternatives to the tree of life for studying genomic fluidity in E. coli , 2011, Biology Direct.

[82]  Eugene V Koonin,et al.  Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems , 2011, Biology Direct.

[83]  M. F. White,et al.  CARF and WYL domains: ligand-binding regulators of prokaryotic defense systems , 2014, Front. Genet..

[84]  John Gatesy,et al.  Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. , 2014, Molecular phylogenetics and evolution.

[85]  E. Koonin,et al.  Evolution of diverse cell division and vesicle formation systems in Archaea , 2010, Nature Reviews Microbiology.

[86]  E. Koonin,et al.  Evolution of replicative DNA polymerases in archaea and their contributions to the eukaryotic replication machinery , 2014, Front. Microbiol..

[87]  L. Regan,et al.  Structure and function of KH domains , 2008, The FEBS journal.

[88]  W. Martin,et al.  The tree of one percent , 2006, Genome Biology.

[89]  Eugene V Koonin,et al.  Connected gene neighborhoods in prokaryotic genomes. , 2002, Nucleic acids research.

[90]  J. Kissinger,et al.  Identification of Diverse Archaeal Proteins with Class III Signal Peptides Cleaved by Distinct Archaeal Prepilin Peptidases , 2006, Journal of bacteriology.

[91]  E. Koonin,et al.  Archaeology of eukaryotic DNA replication. , 2013, Cold Spring Harbor perspectives in medicine.

[92]  E. Koonin,et al.  Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world , 2008, Nucleic acids research.

[93]  Filipa L. Sousa,et al.  Origins of major archaeal clades correspond to gene acquisitions from bacteria , 2014, Nature.

[94]  Eugene V Koonin,et al.  The fundamental units, processes and patterns of evolution, and the Tree of Life conundrum , 2009, Biology Direct.

[95]  Tom A. Williams,et al.  Archaeal “Dark Matter” and the Origin of Eukaryotes , 2014, Genome biology and evolution.

[96]  Nicola J. Mulder,et al.  The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines , 2014, Front. Genet..

[97]  P. Bork,et al.  Orthology prediction methods: A quality assessment using curated protein families , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[98]  V. Lyubetsky,et al.  Modeling RNA polymerase competition: the effect of σ-subunit knockout and heat shock on gene transcription level , 2011, Biology Direct.