Gene Frequency Distributions Reject a Neutral Model of Genome Evolution

Evolution of prokaryotes involves extensive loss and gain of genes, which lead to substantial differences in the gene repertoires even among closely related organisms. Through a wide range of phylogenetic depths, gene frequency distributions in prokaryotic pangenomes bear a characteristic, asymmetrical U-shape, with a core of (nearly) universal genes, a “shell” of moderately common genes, and a “cloud” of rare genes. We employ mathematical modeling to investigate evolutionary processes that might underlie this universal pattern. Gene frequency distributions for almost 400 groups of 10 bacterial or archaeal species each over a broad range of evolutionary distances were fit to steady-state, infinite allele models based on the distribution of gene replacement rates and the phylogenetic tree relating the species in each group. The fits of the theoretical frequency distributions to the empirical ones yield model parameters and estimates of the goodness of fit. Using the Akaike Information Criterion, we show that the neutral model of genome evolution, with the same replacement rate for all genes, can be confidently rejected. Of the three tested models with purifying selection, the one in which the distribution of replacement rates is derived from a stochastic population model with additive per-gene fitness yields the best fits to the data. The selection strength estimated from the fits declines with evolutionary divergence while staying well outside the neutral regime. These findings indicate that, unlike some other universal distributions of genomic variables, for example, the distribution of paralogous gene family membership, the gene frequency distribution is substantially affected by selection.

[1]  E. Koonin,et al.  Birth and death of protein domains: A simple model of evolution explains power law behavior , 2002, BMC Evolutionary Biology.

[2]  P. Gajer,et al.  The Pangenome Structure of Escherichia coli: Comparative Genomic Analysis of E. coli Commensal and Pathogenic Isolates , 2008, Journal of bacteriology.

[3]  Feng Chen,et al.  Patterns and Implications of Gene Gain and Loss in the Evolution of Prochlorococcus , 2007, PLoS genetics.

[4]  B. Snel,et al.  Genomes in flux: the evolution of archaeal and proteobacterial gene content. , 2002, Genome research.

[5]  V. Tetz The pangenome concept: a unifying view of genetic information. , 2005, Medical science monitor : international medical journal of experimental and clinical research.

[6]  Wolfgang R. Hess,et al.  The Infinitely Many Genes Model for the Distributed Genome of Bacteria , 2012, Genome biology and evolution.

[7]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[8]  E. Koonin,et al.  Search for a 'Tree of Life' in the thicket of the phylogenetic forest , 2009, Journal of biology.

[9]  H. Akaike A new look at the statistical model identification , 1974 .

[10]  M. Wiedmann,et al.  Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss , 2010, BMC Genomics.

[11]  J. Gillespie MOLECULAR EVOLUTION OVER THE MUTATIONAL LANDSCAPE , 1984, Evolution; international journal of organic evolution.

[12]  E. Koonin,et al.  Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world , 2008, Nucleic acids research.

[13]  F. Rodríguez-Valera,et al.  The bacterial pan-genome:a new paradigm in microbiology. , 2010, International microbiology : the official journal of the Spanish Society for Microbiology.

[14]  Howard Ochman,et al.  Pathogenicity Islands: Bacterial Evolution in Quantum Leaps , 1996, Cell.

[15]  Robert L Charlebois,et al.  Chlamydia: 780.57 (sd = 1.81), range 778–784, n =7 Cyanobacteria: 820.50 (sd = 23.53), range 776–844, n =8 , 2022 .

[16]  F. Blattner,et al.  Strains of Escherichia coli O157:H7 Differ Primarily by Insertions or Deletions, Not Single-Nucleotide Polymorphisms , 2002, Journal of bacteriology.

[17]  A. W. F. Edwards,et al.  The statistical processes of evolutionary theory , 1963 .

[18]  M. Lynch The evolution of genetic networks by non-adaptive processes , 2007, Nature Reviews Genetics.

[19]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[20]  J. Lawrence,et al.  Genome evolution in bacteria: order beneath chaos. , 2005, Current opinion in microbiology.

[21]  James J. Davis,et al.  Similarity of genes horizontally acquired by Escherichia coli and Salmonella enterica is evidence of a supraspecies pangenome , 2011, Proceedings of the National Academy of Sciences.

[22]  Michael Y. Galperin,et al.  Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes , 2003, BMC Evolutionary Biology.

[23]  Inna Dubchak,et al.  MicrobesOnline: an integrated portal for comparative and functional genomics , 2009, Nucleic Acids Res..

[24]  Eugene V Koonin,et al.  Universal distribution of protein evolution rates as a consequence of protein folding physics , 2010, Proceedings of the National Academy of Sciences.

[25]  M. Lynch Streamlining and simplification of microbial genome architecture. , 2006, Annual review of microbiology.

[26]  Eduardo N. Taboada,et al.  Genome evolution in major Escherichia coli O157:H7 lineages , 2007, BMC Genomics.

[27]  P. Higgs,et al.  Testing the infinitely many genes model for the evolution of the bacterial core genome and pangenome. , 2012, Molecular biology and evolution.

[28]  H. Tettelin,et al.  The microbial pan-genome. , 2005, Current opinion in genetics & development.

[29]  Tal Dagan,et al.  Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution , 2008, Proceedings of the National Academy of Sciences.

[30]  D. Crook,et al.  Genomic islands: tools of bacterial horizontal gene transfer and evolution , 2008, FEMS microbiology reviews.

[31]  Eugene V. Koonin,et al.  Comparative genomics, minimal gene-sets and the last universal common ancestor , 2003, Nature Reviews Microbiology.

[32]  David R. Riley,et al.  Comparative genomics: the bacterial pan-genome. , 2008, Current opinion in microbiology.

[33]  M. Huynen,et al.  The frequency distribution of gene family sizes in complete genomes. , 1998, Molecular biology and evolution.

[34]  N. W. Davis,et al.  Genome sequence of enterohaemorrhagic Escherichia coli O157:H7 , 2001, Nature.

[35]  W. Hess,et al.  The diversity of a distributed genome in bacterial populations , 2009, 0907.2572.

[36]  C. Fields,et al.  Biogeography of the Sulfolobus islandicus pan-genome , 2009, Proceedings of the National Academy of Sciences.

[37]  H. Tettelin,et al.  Extensive genomic diversity of closely related Wolbachia strains. , 2009, Microbiology.

[38]  Julian Parkhill,et al.  Microbiology in the post-genomic era , 2008, Nature Reviews Microbiology.

[39]  A. Danchin,et al.  Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths , 2009, PLoS genetics.

[40]  P. Phillips Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems , 2008, Nature Reviews Genetics.

[41]  S. Lukyanov,et al.  PCR-based subtractive hybridization and differences in gene content among strains of Helicobacter pylori. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Eugene V. Koonin,et al.  Are There Laws of Genome Evolution? , 2011, PLoS Comput. Biol..

[43]  E. Koonin The Logic of Chance: The Nature and Origin of Biological Evolution , 2011 .

[44]  Joshua S Weitz,et al.  A neutral theory of genome evolution and the frequency distribution of genes , 2012, BMC Genomics.

[45]  C. Ouzounis,et al.  The balance of driving forces during genome evolution in prokaryotes. , 2003, Genome research.