Trends in Prokaryotic Evolution Revealed by Comparison of Closely Related Bacterial and Archaeal Genomes

ABSTRACT In order to explore microevolutionary trends in bacteria and archaea, we constructed a data set of 41 alignable tight genome clusters (ATGCs). We show that the ratio of the medians of nonsynonymous to synonymous substitution rates (dN/dS) that is used as a measure of the purifying selection pressure on protein sequences is a stable characteristic of the ATGCs. In agreement with previous findings, parasitic bacteria, notwithstanding the sometimes dramatic genome shrinkage caused by gene loss, are typically subjected to relatively weak purifying selection, presumably owing to relatively small effective population sizes and frequent bottlenecks. However, no evidence of genome streamlining caused by strong selective pressure was found in any of the ATGCs. On the contrary, a significant positive correlation between the genome size, as well as gene size, and selective pressure was observed, although a variety of free-living prokaryotes with very close selective pressures span nearly the entire range of genome sizes. In addition, we examined the connections between the sequence evolution rate and other genomic features. Although gene order changes much faster than protein sequences during the evolution of prokaryotes, a strong positive correlation was observed between the “rearrangement distance” and the amino acid distance, suggesting that at least some of the events leading to genome rearrangement are subjected to the same type of selective constraints as the evolution of amino acid sequences.

[1]  Frédéric Partensky,et al.  Accelerated evolution associated with genome reduction in a free-living prokaryote , 2005, Genome Biology.

[2]  Leila A. Mamirova,et al.  Purifying selection in mitochondria, free-living and obligate intracellular proteobacteria , 2007, BMC Evolutionary Biology.

[3]  P. Bork,et al.  Quantification of insect genome divergence. , 2007, Trends in genetics : TIG.

[4]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[5]  Siv G. E. Andersson,et al.  Comparative genomics of microbial pathogens and symbionts , 2002, ECCB.

[6]  M. Noordewier,et al.  Genome Streamlining in a Cosmopolitan Oceanic Bacterium , 2005, Science.

[7]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[8]  T. Gojobori,et al.  The genome stability in Corynebacterium species due to lack of the recombinational repair system. , 2003, Gene.

[9]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[10]  N. Grishin,et al.  From complete genomes to measures of substitution rate variability within and between proteins. , 2000, Genome research.

[11]  E. Tillier,et al.  Genome rearrangement by replication-directed translocation , 2000, Nature Genetics.

[12]  Eugene V Koonin,et al.  Microevolutionary genomics of bacteria. , 2002, Theoretical population biology.

[13]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[14]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[15]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[16]  Inna Dubchak,et al.  ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes , 2008, Nucleic Acids Res..

[17]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[18]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[19]  P. Bork,et al.  Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli , 1996, Current Biology.

[20]  L. Hurst The Ka/Ks ratio: diagnosing the form of sequence evolution. , 2002, Trends in genetics : TIG.

[21]  M. Lynch Streamlining and simplification of microbial genome architecture. , 2006, Annual review of microbiology.

[22]  S. Andersson,et al.  Microbial genome evolution: sources of variability. , 2002, Current opinion in microbiology.

[23]  Eduardo P C Rocha,et al.  Comparisons of dN/dS are time dependent for closely related bacterial genomes. , 2006, Journal of theoretical biology.

[24]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[25]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[26]  P A Pevzner,et al.  Genome sequence comparison and scenarios for gene rearrangements: a test case. , 1995, Genomics.

[27]  E. Koonin,et al.  Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. , 2001, Genome research.

[28]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[29]  H. Mori,et al.  Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes. , 1999, Molecular biology and evolution.

[30]  Nikos Kyrpides,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[31]  Hajime Ishikawa,et al.  The 160-Kilobase Genome of the Bacterial Endosymbiont Carsonella , 2006, Science.

[32]  S. Salzberg,et al.  Evidence for symmetric chromosomal inversions around the replication origin in bacteria , 2000, Genome Biology.

[33]  E V Koonin,et al.  Gene order is not conserved in bacterial evolution. , 1996, Trends in genetics : TIG.

[34]  H. Myllykallio,et al.  Flavin-dependent thymidylate synthase X limits chromosomal DNA replication , 2008, Proceedings of the National Academy of Sciences.

[35]  P. Pevzner,et al.  Genome-scale evolution: reconstructing gene orders in the ancestral species. , 2002, Genome research.

[36]  J. Felsenstein Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. , 1996, Methods in enzymology.