ATGC database and ATGC-COGs: an updated resource for micro- and macro-evolutionary studies of prokaryotic genomes and protein family annotation

The Alignable Tight Genomic Clusters (ATGCs) database is a collection of closely related bacterial and archaeal genomes that provides several tools to aid research into evolutionary processes in the microbial world. Each ATGC is a taxonomy-independent cluster of 2 or more completely sequenced genomes that meet the objective criteria of a high degree of local gene order (synteny) and a small number of synonymous substitutions in the protein-coding genes. As such, each ATGC is suited for analysis of microevolutionary variations within a cohesive group of organisms (e.g. species), whereas the entire collection of ATGCs is useful for macroevolutionary studies. The ATGC database includes many forms of pre-computed data, in particular ATGC-COGs (Clusters of Orthologous Genes), multiple sequence alignments, a set of ‘index’ orthologs representing the most well-conserved members of each ATGC-COG, the phylogenetic tree of the organisms within each ATGC, etc. Although the ATGC database contains several million proteins from thousands of genomes organized into hundreds of clusters (roughly a 4-fold increase since the last version of the ATGC database), it is now built with completely automated methods and will be regularly updated following new releases of the NCBI RefSeq database. The ATGC database is hosted jointly at the University of Iowa at dmk-brain.ecn.uiowa.edu/ATGC/ and the NCBI at ftp.ncbi.nlm.nih.gov/pub/kristensen/ATGC/atgc_home.html.

[1]  Michael Y. Galperin,et al.  Expanded microbial genome coverage and improved protein family annotation in the COG database , 2014, Nucleic Acids Res..

[2]  Tandy J. Warnow,et al.  PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences , 2015, J. Comput. Biol..

[3]  Alexandros Stamatakis,et al.  A daily-updated tree of (sequenced) life as a reference for genome research , 2013, Scientific Reports.

[4]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[5]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[6]  Sagi Snir,et al.  Stability along with Extreme Variability in Core Genome Evolution , 2013, Genome biology and evolution.

[7]  Vera Grimm,et al.  Universal distribution of mutational effects on protein stability , uncoupling of protein robustness from sequence evolution and distinct evolutionary modes of prokaryotic and eukaryotic proteins , 2015 .

[8]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[9]  Fyodor A. Kondrashov,et al.  Sequence space and the ongoing expansion of the protein universe , 2010, Nature.

[10]  Eugene V Koonin,et al.  Contribution of phage-derived genomic islands to the virulence of facultative bacterial pathogens. , 2013, Environmental microbiology.

[11]  E. Koonin,et al.  Comprehensive comparative-genomic analysis of Type 2 toxin-antitoxin systems and related mobile stress response systems in prokaryotes , 2009, Biology Direct.

[12]  Eytan Ruppin,et al.  Evolutionary Conservation of Bacterial Essential Metabolic Genes across All Bacterial Culture Media , 2015, PloS one.

[13]  Eugene V. Koonin,et al.  Coupling Between Protein Level Selection and Codon Usage Optimization in the Evolution of Bacteria and Archaea , 2014, mBio.

[14]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[15]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[16]  C. Randal Linder,et al.  Multiple sequence alignment: a major challenge to large-scale phylogenetics , 2011, PLoS currents.

[17]  Inna Dubchak,et al.  Trends in Prokaryotic Evolution Revealed by Comparison of Closely Related Bacterial and Archaeal Genomes , 2008, Journal of bacteriology.

[18]  Christophe Dessimoz,et al.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods , 2009, PLoS Comput. Biol..

[19]  Peer Bork,et al.  Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees , 2016, Nucleic Acids Res..

[20]  Lavanya Kannan,et al.  A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches , 2010, Bioinform..

[21]  M. Huynen,et al.  Benchmarking ortholog identification methods using functional genomics data , 2006, Genome Biology.

[22]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[23]  Kira S. Makarova,et al.  Nature and Intensity of Selection Pressure on CRISPR-Associated Genes , 2011, Journal of bacteriology.

[24]  Henryk Urbanczyk,et al.  Reclassification of Vibrio fischeri, Vibrio logei, Vibrio salmonicida and Vibrio wodanis as Aliivibrio fischeri gen. nov., comb. nov., Aliivibrio logei comb. nov., Aliivibrio salmonicida comb. nov. and Aliivibrio wodanis comb. nov. , 2007, International journal of systematic and evolutionary microbiology.

[25]  Inna Dubchak,et al.  ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes , 2008, Nucleic Acids Res..

[26]  Ziheng Yang,et al.  Statistical methods for detecting molecular adaptation , 2000, Trends in Ecology & Evolution.

[27]  Ruiting Lan,et al.  Escherichia coli in disguise: molecular origins of Shigella. , 2002, Microbes and infection.

[28]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[29]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[30]  Chitra Dutta,et al.  BPGA- an ultra-fast pan-genome analysis pipeline , 2016, Scientific Reports.

[31]  Eugene V. Koonin,et al.  Constraints and plasticity in genome and molecular-phenome evolution , 2010, Nature Reviews Genetics.

[32]  Eugene V Koonin,et al.  Duplicated genes evolve slower than singletons despite the initial rate increase , 2004, BMC Evolutionary Biology.

[33]  Kevin J. Liu,et al.  RAxML and FastTree: Comparing Two Methods for Large-Scale Maximum Likelihood Phylogeny Estimation , 2011, PloS one.

[34]  Eugene V Koonin,et al.  No evidence of inhibition of horizontal gene transfer by CRISPR–Cas on evolutionary timescales , 2015, The ISME Journal.

[35]  E. Koonin,et al.  Genomes in turmoil: quantification of genome dynamics in prokaryote supergenomes , 2014, BMC Biology.

[36]  Sagi Snir,et al.  Defense Islands in Bacterial and Archaeal Genomes and Prediction of Novel Defense Systems , 2011, Journal of bacteriology.

[37]  Fyodor A Kondrashov,et al.  Stop codons in bacteria are not selectively equivalent , 2012, Biology Direct.

[38]  Eugene V Koonin,et al.  Evolvability of an Optimal Recombination Rate , 2015, Genome biology and evolution.

[39]  Arcady R. Mushegian,et al.  Computational methods for Gene Orthology inference , 2011, Briefings Bioinform..