A phylogenetic mixture model for gene family loss in parasitic bacteria.

Gene families are frequently gained and lost from prokaryotic genomes. It is widely believed that the rate of loss was accelerated for some but not all gene families in lineages that became parasites or endosymbionts. This leads to a form of heterotachy that may be responsible for the poor performance of phylogeny estimation based on gene content. We describe a mixture model that accounts for this heterotachy. We show that this model fits data on the distribution of gene families across bacteria from the COG database much better than previous models. However, it still favors an artifactual tree topology in which parasites form a clade over the more plausible 16S topology. In contrast to a previous model of genome dynamics, our model suggests that the ancestral bacterium had a small genome. We suggest that models of gene family gain and loss are likely to be more useful for understanding genome dynamics than for estimating phylogenetic trees.

[1]  M. Spencer,et al.  Conditioned genome reconstruction: how to avoid choosing the conditioning genome. , 2007, Systematic biology.

[2]  Weilong Hao,et al.  Uncovering rate variation of lateral gene transfer during bacterial genome evolution , 2008, BMC Genomics.

[3]  E. Karlberg,et al.  Computational inference of scenarios for alpha-proteobacterial genome evolution. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J. Lake,et al.  Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. , 2004, Molecular biology and evolution.

[5]  W. Martin,et al.  Ancestral genome sizes specify the minimum rate of lateral gene transfer during prokaryote evolution , 2007, Proceedings of the National Academy of Sciences.

[6]  J. McInerney On the desirability of models for inferring genome phylogenies. , 2006, Trends in microbiology.

[7]  A. L. Koch Were Gram-positive rods the first bacteria? , 2003, Trends in microbiology.

[8]  B. Snel,et al.  Toward Automatic Reconstruction of a Highly Resolved Tree of Life , 2006, Science.

[9]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[10]  James R. Cole,et al.  The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data , 2006, Nucleic Acids Res..

[11]  J. Kalbfleisch,et al.  A modified likelihood ratio test for homogeneity in finite mixture models , 2001 .

[12]  P. H. A. Sneath,et al.  Sergey's Manual of Systematic Bacteriology — Volume 2 , 1987, 1987.

[13]  Toshihisa Takagi,et al.  Reconstruction of highly heterogeneous gene-content evolution across the three domains of life , 2007, ISMB/ECCB.

[14]  in chief George M. Garrity Bergey’s Manual® of Systematic Bacteriology , 1989, Springer New York.

[15]  Adi Stern,et al.  A likelihood framework to analyse phyletic patterns , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[16]  Eugene V. Koonin,et al.  Comparative genomics, minimal gene-sets and the last universal common ancestor , 2003, Nature Reviews Microbiology.

[17]  N. Moran,et al.  Deletional bias and the evolution of bacterial genomes. , 2001, Trends in genetics : TIG.

[18]  J. Castresana,et al.  Comparative genomics and bioenergetics. , 2001, Biochimica et biophysica acta.

[19]  W. Doolittle,et al.  How big is the iceberg of which organellar genes in nuclear genomes are but the tip? , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[20]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[21]  Wen-Hsiung Li,et al.  Fundamentals of molecular evolution , 1990 .

[22]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[23]  H Kishino,et al.  Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters. , 2000, Molecular biology and evolution.

[24]  Simon Whelan,et al.  Spatial and temporal heterogeneity in nucleotide sequence evolution. , 2008, Molecular biology and evolution.

[25]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[26]  G. B. Golding,et al.  The fate of laterally transferred genes: life in the fast lane to adaptation or death. , 2006, Genome research.

[27]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[28]  B. Snel,et al.  Genomes in flux: the evolution of archaeal and proteobacterial gene content. , 2002, Genome research.

[29]  Xun Gu,et al.  Genome phylogenetic analysis based on extended gene contents. , 2004, Molecular biology and evolution.

[30]  Daniel H. Huson,et al.  Dendroscope: An interactive viewer for large phylogenetic trees , 2007, BMC Bioinformatics.

[31]  B. Snel,et al.  Genome trees and the nature of genome evolution. , 2005, Annual review of microbiology.

[32]  Michael Y. Galperin,et al.  Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes , 2003, BMC Evolutionary Biology.

[33]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[34]  C. Ouzounis,et al.  The balance of driving forces during genome evolution in prokaryotes. , 2003, Genome research.

[35]  Bryan Kolaczkowski,et al.  Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous , 2004, Nature.

[36]  N. Moran,et al.  Genes Lost and Genes Found: Evolution of Bacterial Pathogenesis and Symbiosis , 2001, Science.

[37]  Bengt Sennblad,et al.  Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution , 2004, RECOMB.