COG database update: focus on microbial diversity, model organisms, and widespread pathogens

The Clusters of Orthologous Genes (COG) database, also referred to as the Clusters of Orthologous Groups of proteins, was created in 1997 and went through several rounds of updates, most recently, in 2014. The current update, available at https://www.ncbi.nlm.nih.gov/research/COG, substantially expands the scope of the database to include complete genomes of 1187 bacteria and 122 archaea, typically, with a single genome per genus. In addition, the current version of the COGs includes the following new features: (i) the recently deprecated NCBI's gene index (gi) numbers for the encoded proteins are replaced with stable RefSeq or GenBank\ENA\DDBJ coding sequence (CDS) accession numbers; (ii) COG annotations are updated for >200 newly characterized protein families with corresponding references and PDB links, where available; (iii) lists of COGs grouped by pathways and functional systems are added; (iv) 266 new COGs for proteins involved in CRISPR-Cas immunity, sporulation in Firmicutes and photosynthesis in cyanobacteria are included; and (v) the database is made available as a web page, in addition to FTP. The current release includes 4877 COGs. Future plans include further expansion of the COG collection by adding archaeal COGs (arCOGs), splitting the COGs containing multiple paralogs, and continued refinement of COG annotations.

[1]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[2]  Morten Duno,et al.  Respiratory chain complex I deficiency due to NDUFA12 mutations as a new cause of Leigh syndrome , 2011, Journal of Medical Genetics.

[3]  Michael Y. Galperin,et al.  Using the COG Database to Improve Gene Recognition in Complete Genomes , 2004, Genetica.

[4]  Silvio C. E. Tosatto,et al.  InterPro in 2019: improving coverage, classification and access to protein sequence annotations , 2018, Nucleic Acids Res..

[5]  Radhey S. Gupta,et al.  Phylogenomics and Comparative Genomic Studies Robustly Support Division of the Genus Mycobacterium into an Emended Genus Mycobacterium and Four Novel Genera , 2018, Front. Microbiol..

[6]  V. de Crécy-Lagard,et al.  Survey and Validation of tRNA Modifications and Their Corresponding Genes in Bacillus subtilis sp Subtilis Strain 168 , 2020, Biomolecules.

[7]  J. Stülke,et al.  An Essential Poison: Synthesis and Degradation of Cyclic Di-AMP in Bacillus subtilis , 2015, Journal of bacteriology.

[8]  E. Koonin,et al.  Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea , 2007, Biology Direct.

[9]  Michael Y. Galperin,et al.  The cyanobacterial genome core and the origin of photosynthesis , 2006, Proceedings of the National Academy of Sciences.

[10]  Eugene V. Koonin,et al.  Phylogenomics of Prokaryotic Ribosomal Proteins , 2012, PloS one.

[11]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[12]  E. Dittmann,et al.  Structural and functional insights into the unique CBS–CP12 fusion protein family in cyanobacteria , 2018, Proceedings of the National Academy of Sciences.

[13]  P. Trost,et al.  Reconstitution and Properties of the Recombinant Glyceraldehyde-3-Phosphate Dehydrogenase/CP12/Phosphoribulokinase Supramolecular Complex of Arabidopsis1 , 2005, Plant Physiology.

[14]  David Dylus,et al.  OMA standalone: orthology inference among public and custom genomes and transcriptomes. , 2019, Genome research.

[15]  Kira S. Makarova,et al.  Archaeal Clusters of Orthologous Genes (arCOGs): An Update and Application for Analysis of Shared Features between Thermococcales, Methanococcales, and Methanobacteriales , 2015, Life.

[16]  M. Mimuro,et al.  Unique constitution of photosystem I with a novel subunit in the cyanobacterium Gloeobacter violaceus PCC 7421 , 2004, FEBS letters.

[17]  Davide Heller,et al.  eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses , 2018, Nucleic Acids Res..

[18]  E. Koonin,et al.  Evolutionary Genomics of Lactic Acid Bacteria , 2006, Journal of bacteriology.

[19]  S. Leimkühler The biosynthesis of the molybdenum cofactors in Escherichia coli. , 2020, Environmental microbiology.

[20]  Michael Y. Galperin,et al.  Sequence — Evolution — Function , 2003, Springer US.

[21]  R. Ficner,et al.  Making and Breaking of an Essential Poison: the Cyclases and Phosphodiesterases That Produce and Degrade the Essential Second Messenger Cyclic di-AMP in Bacteria , 2018, Journal of bacteriology.

[22]  Michael Y. Galperin,et al.  Microbial genome analysis: the COG approach , 2019, Briefings Bioinform..

[23]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[24]  Gaston H. Gonnet,et al.  The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces , 2017, Nucleic Acids Res..

[25]  Jörg Stülke,et al.  SubtiWiki in 2018: from genes and proteins to functional network annotation of the model organism Bacillus subtilis , 2017, Nucleic Acids Res..

[26]  Gira Bhabha,et al.  Architectures of Lipid Transport Systems for the Bacterial Outer Membrane , 2017, Cell.

[27]  Evgeny M. Zdobnov,et al.  OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs , 2018, Nucleic Acids Res..

[28]  Wen J. Li,et al.  RefSeq: an update on prokaryotic genome annotation and curation , 2017, Nucleic Acids Res..

[29]  P. Lawson,et al.  Reclassification of Clostridium difficile as Clostridioides difficile (Hall and O'Toole 1935) Prévot 1938. , 2016, Anaerobe.

[30]  J. D. Reid,et al.  Structural and biochemical characterization of Gun4 suggests a mechanism for its role in chlorophyll biosynthesis. , 2005, Biochemistry.

[31]  Jonathan P. Zehr,et al.  Globally Distributed Uncultivated Oceanic N2-Fixing Cyanobacteria Lack Oxygenic Photosystem II , 2008, Science.

[32]  Rolf Backofen,et al.  Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants , 2019, Nature Reviews Microbiology.

[33]  Joanne Chory,et al.  Structure of the Mg-Chelatase Cofactor GUN4 Reveals a Novel Hand-Shaped Fold for Porphyrin Binding , 2005, PLoS biology.

[34]  Michael Y. Galperin,et al.  A genomic update on clostridial phylogeny: Gram-negative spore formers and other misplaced clostridia. , 2013, Environmental microbiology.

[35]  K. Ramamurthi,et al.  Dash-and-Recruit Mechanism Drives Membrane Curvature Recognition by the Small Bacterial Protein SpoVM. , 2017, Cell systems.

[36]  Charles A. R. Cotton,et al.  Structural basis of light-induced redox regulation in the Calvin–Benson cycle in cyanobacteria , 2019, Proceedings of the National Academy of Sciences.

[37]  Susumu Goto,et al.  KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold , 2019, Bioinformatics.

[38]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[39]  G. Witte,et al.  c-di-AMP hydrolysis by the phosphodiesterase AtaC promotes differentiation of multicellular bacteria , 2020, Proceedings of the National Academy of Sciences.

[40]  Kira S. Makarova,et al.  Systematic prediction of genes functionally linked to CRISPR-Cas systems by gene neighborhood analysis , 2018, Proceedings of the National Academy of Sciences.

[41]  P. Dibrov,et al.  Na+‐NQR (Na+‐translocating NADH:ubiquinone oxidoreductase) as a novel target for antibiotics , 2017, FEMS microbiology reviews.

[42]  Michael Y. Galperin,et al.  Expanded microbial genome coverage and improved protein family annotation in the COG database , 2014, Nucleic Acids Res..

[43]  A. Shen,et al.  The Conserved Spore Coat Protein SpoVM Is Largely Dispensable in Clostridium difficile Spore Formation , 2017, mSphere.

[44]  N. Kurosawa,et al.  Saccharolobus caldissimus gen. nov., sp. nov., a facultatively anaerobic iron-reducing hyperthermophilic archaeon isolated from an acidic terrestrial hot spring, and reclassification of Sulfolobus solfataricus as Saccharolobus solfataricus comb. nov. and Sulfolobus shibatae as Saccharolobus shibatae , 2018, International journal of systematic and evolutionary microbiology.

[45]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[46]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[47]  Connor T. Skennerton,et al.  An Expanded Genomic Representation of the Phylum Cyanobacteria , 2014, Genome biology and evolution.

[48]  C. Bertelli,et al.  ChlamDB: a comparative genomics database of the phylum Chlamydiae and other members of the Planctomycetes-Verrucomicrobiae-Chlamydiae superphylum , 2019, Nucleic Acids Res..

[49]  P. Rustin,et al.  Supernumerary subunits NDUFA3, NDUFA5 and NDUFA12 are required for the formation of the extramembrane arm of human mitochondrial complex I , 2014, FEBS letters.

[50]  Ayal B. Gussow,et al.  Evolutionary and functional classification of the CARF domain superfamily, key sensors in prokaryotic antivirus defense. , 2020, Nucleic acids research.

[51]  Hirokazu Chiba,et al.  MBGD update 2018: microbial genome database based on hierarchical orthology relations covering closely related and distantly related comparisons , 2018, Nucleic Acids Res..

[52]  Sita J. Saunders,et al.  An updated evolutionary classification of CRISPR–Cas systems , 2015, Nature Reviews Microbiology.

[53]  J. Ecker,et al.  GUN4, a Regulator of Chlorophyll Synthesis and Intracellular Signaling , 2003, Science.

[54]  Natalya Yutin,et al.  Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer , 2012, Biology Direct.

[55]  Renato J. Alves,et al.  A Genomic Signature and the Identification of New Sporulation Genes , 2013, Journal of bacteriology.

[56]  D. Dibrova,et al.  Phylogenomic analysis of type 1 NADH:Quinone oxidoreductase , 2016, Biochemistry (Moscow).

[57]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[58]  Michael Y. Galperin,et al.  Cyclic di-AMP, a second messenger of primary importance: tertiary structures and binding mechanisms. , 2020, Nucleic acids research.

[59]  Mercè Llabrés,et al.  An update on the Symbiotic Genomes Database (SymGenDB): a collection of metadata, genomic, genetic and protein sequences, orthologs and metabolic networks of symbiotic organisms , 2020, Database J. Biol. Databases Curation.

[60]  Brian C. Thomas,et al.  The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to Cyanobacteria , 2013, eLife.

[61]  Michael Y. Galperin,et al.  A decade of research on the second messenger c-di-AMP. , 2020, FEMS microbiology reviews.

[62]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[63]  Narmada Thanki,et al.  CDD/SPARCLE: the conserved domain database in 2020 , 2019, Nucleic Acids Res..

[64]  Craig T. Resch,et al.  Development of a novel rationally designed antibiotic to inhibit a nontraditional bacterial target. , 2017, Canadian journal of physiology and pharmacology.

[65]  Lavanya Kannan,et al.  A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches , 2010, Bioinform..

[66]  Simplification of Ribosomes in Bacteria with Tiny Genomes , 2020, Molecular biology and evolution.

[67]  Michael Y. Galperin,et al.  Genomic determinants of sporulation in Bacilli and Clostridia: towards the minimal set of sporulation-specific genes , 2012, Environmental microbiology.