Estimation of prokaryotic supergenome size and composition from gene frequency distributions

BackgroundBecause prokaryotic genomes experience a rapid flux of genes, selection may act at a higher level than an individual genome. We explore a quantitative model of the distributed genome whereby groups of genomes evolve by acquiring genes from a fixed reservoir which we denote as supergenome. Previous attempts to understand the nature of the supergenome treated genomes as random, independent collections of genes and assumed that the supergenome consists of a small number of homogeneous sub-reservoirs. Here we explore the consequences of relaxing both assumptions.ResultsWe surveyed several methods for estimating the size and composition of the supergenome. The methods assumed that genomes were either random, independent samples of the supergenome or that they evolved from a common ancestor along a known tree via stochastic sampling from the reservoir. The reservoir was assumed to be either a collection of homogeneous sub-reservoirs or alternatively composed of genes with Gamma distributed gain probabilities. Empirical gene frequencies were used to either compute the likelihood of the data directly or first to reconstruct the history of gene gains and then compute the likelihood of the reconstructed numbers of gains.ConclusionsSupergenome size estimates using the empirical gene frequencies directly are not robust with respect to the choice of the model. By contrast, using the gene frequencies and the phylogenetic tree to reconstruct multiple gene gains produces reliable estimates of the supergenome size and indicates that a homogeneous supergenome is more consistent with the data than a supergenome with Gamma distributed gain probabilities.

[1]  Garth D Ehrlich,et al.  Comparative supragenomic analyses among the pathogens Staphylococcus aureus, Streptococcus pneumoniae, and Haemophilus influenzae Using a modification of the finite supragenome model , 2011, BMC Genomics.

[2]  Eugene V. Koonin,et al.  Gene Frequency Distributions Reject a Neutral Model of Genome Evolution , 2013, Genome biology and evolution.

[3]  E. Koonin,et al.  Genomes in turmoil: quantification of genome dynamics in prokaryote supergenomes , 2014, BMC Biology.

[4]  David R. Riley,et al.  Comparative genomics: the bacterial pan-genome. , 2008, Current opinion in microbiology.

[5]  E. Koonin,et al.  The Tree and Net Components of Prokaryote Evolution , 2010, Genome biology and evolution.

[6]  Miklós Csuös,et al.  Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood , 2010, Bioinform..

[7]  W. Martin,et al.  Getting a better picture of microbial evolution en route to a network of genomes , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[8]  E. Koonin,et al.  Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world , 2008, Nucleic acids research.

[9]  P. Higgs,et al.  Testing the infinitely many genes model for the evolution of the bacterial core genome and pangenome. , 2012, Molecular biology and evolution.

[10]  Justin S. Hogg,et al.  Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains , 2007, Genome Biology.

[11]  Pascal Lapierre,et al.  Estimating the size of the bacterial pan-genome. , 2009, Trends in genetics : TIG.

[12]  A. Kolstø,et al.  Dynamic bacterial genome organization , 1997, Molecular microbiology.

[13]  Søren J. Sørensen,et al.  Conjugative plasmids: vessels of the communal gene pool , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[14]  Trygve Almøy,et al.  Microbial comparative pan-genomics using binomial mixture models , 2009, BMC Genomics.

[15]  Eric J Alm,et al.  Horizontal gene transfer and the evolution of bacterial and archaeal population structure. , 2013, Trends in genetics : TIG.

[16]  B. Snel,et al.  Genomes in flux: the evolution of archaeal and proteobacterial gene content. , 2002, Genome research.

[17]  F. Cohan,et al.  The Origins of Ecological Diversity in Prokaryotes , 2008, Current Biology.

[18]  A. Chao Estimating the population size for capture-recapture data with unequal catchability. , 1987, Biometrics.

[19]  Theo P. van der Weide,et al.  A formal derivation of Heaps' Law , 2005, Inf. Sci..

[20]  David R. Riley,et al.  Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species , 2010, Genome Biology.

[21]  N. Perna,et al.  Analysis of the Lactobacillus casei supragenome and its influence in species evolution and lifestyle adaptation , 2012, BMC Genomics.

[22]  H. Tettelin,et al.  The microbial pan-genome. , 2005, Current opinion in genetics & development.

[23]  Michael Y. Galperin,et al.  Prokaryotic genomes: the emerging paradigm of genome-based microbiology. , 1997, Current opinion in genetics & development.

[24]  H. Ochman,et al.  Lateral and oblique gene transfer. , 2001, Current opinion in genetics & development.

[25]  Wolfgang R. Hess,et al.  The Infinitely Many Genes Model for the Distributed Genome of Bacteria , 2012, Genome biology and evolution.

[26]  Inna Dubchak,et al.  MicrobesOnline: an integrated portal for comparative and functional genomics , 2009, Nucleic Acids Res..