ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes

The database of Alignable Tight Genomic Clusters (ATGCs) consists of closely related genomes of archaea and bacteria, and is a resource for research into prokaryotic microevolution. Construction of a data set with appropriate characteristics is a major hurdle for this type of studies. With the current rate of genome sequencing, it is difficult to follow the progress of the field and to determine which of the available genome sets meet the requirements of a given research project, in particular, with respect to the minimum and maximum levels of similarity between the included genomes. Additionally, extraction of specific content, such as genomic alignments or families of orthologs, from a selected set of genomes is a complicated and time-consuming process. The database addresses these problems by providing an intuitive and efficient web interface to browse precomputed ATGCs, select appropriate ones and access ATGC-derived data such as multiple alignments of orthologous proteins, matrices of pairwise intergenomic distances based on genome-wide analysis of synonymous and nonsynonymous substitution rates and others. The ATGC database will be regularly updated following new releases of the NCBI RefSeq. The database is hosted by the Genomics Division at Lawrence Berkeley National laboratory and is publicly available at http://atgc.lbl.gov

[1]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[2]  Matthew R. Laird,et al.  Improving the specificity of high-throughput ortholog prediction , 2006, BMC Bioinformatics.

[3]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[4]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[5]  N. Grishin,et al.  Genome trees constructed using five different approaches suggest new major bacterial clades , 2001, BMC Evolutionary Biology.

[6]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[7]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[8]  Eugene V Koonin,et al.  Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins , 2001, Genome Biology.

[9]  Katherine H. Huang,et al.  The MicrobesOnline Web site for comparative genomics. , 2005, Genome research.

[10]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[11]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[12]  E. Koonin,et al.  Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea , 2007, Biology Direct.

[13]  Jean-Michel Claverie,et al.  FusionDB: a database for in-depth analysis of prokaryotic gene fusion events , 2004, Nucleic Acids Res..

[14]  I. Miklós,et al.  Dynamics of Genome Rearrangement in Bacterial Populations , 2008, PLoS genetics.

[15]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[16]  E. Koonin,et al.  Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world , 2008, Nucleic acids research.

[17]  Sean R. Eddy,et al.  RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs , 2002, BMC Bioinformatics.

[18]  I-Min A. Chen,et al.  The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions , 2007, Nucleic Acids Res..

[19]  H. Ochman,et al.  Lateral gene transfer and the nature of bacterial innovation , 2000, Nature.

[20]  Lior Pachter,et al.  VISTA: computational tools for comparative genomics , 2004, Nucleic Acids Res..

[21]  J. Glasner,et al.  Genome-wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli , 2006, Genome Biology.

[22]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[23]  A. Eyre-Walker,et al.  The rate of adaptive evolution in enteric bacteria. , 2006, Molecular biology and evolution.

[24]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[25]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[26]  Nikos Kyrpides,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[27]  Eduardo P C Rocha,et al.  Comparisons of dN/dS are time dependent for closely related bacterial genomes. , 2006, Journal of theoretical biology.