A simulation test bed for hypotheses of genome evolution

MOTIVATION Microbial genomes undergo evolutionary processes such as gene family expansion and contraction, variable rates and patterns of sequence substitution and lateral genetic transfer. Simulation tools are essential for both the generation of data under different evolutionary models and the validation of analytical methods on such data. However, meaningful investigation of phenomena such as lateral genetic transfer requires the simultaneous consideration of many underlying evolutionary processes. RESULTS We have developed EvolSimulator, a software package that combines non-stationary sequence and gene family evolution together with models of lateral genetic transfer, within a customizable birth-death model of speciation and extinction. Here, we examine simulated data sets generated with EvolSimulator using existing statistical techniques from the evolutionary literature, showing in detail each component of the simulation strategy. AVAILABILITY Source code, manual and other information are freely available at www.bioinformatics.org.au/evolsim. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Oliver G. Pybus,et al.  Testing macro–evolutionary models using incomplete molecular phylogenies , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[2]  J. Lawrence,et al.  Selection for Chromosome Architecture in Bacteria , 2006, Journal of Molecular Evolution.

[3]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[4]  Jianzhi Zhang Evolution by gene duplication: an update , 2003 .

[5]  S. Heard,et al.  PATTERNS IN TREE BALANCE AMONG CLADISTIC, PHENETIC, AND RANDOMLY GENERATED PHYLOGENETIC TREES , 1992, Evolution; international journal of organic evolution.

[6]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[7]  M. Steel,et al.  Recovering evolutionary trees under a more realistic model of sequence evolution. , 1994, Molecular biology and evolution.

[8]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[9]  D. H. Colless,et al.  RELATIVE SYMMETRY OF CLADOGRAMS AND PHENOGRAMS : AN EXPERIMENTAL STUDY , 1995 .

[10]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[11]  P. Sharp,et al.  Variation in the strength of selected codon usage bias among bacteria , 2005, Nucleic acids research.

[12]  Timothy J. Harlow,et al.  Highways of gene sharing in prokaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Faisal Ababneh,et al.  Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences , 2006, Bioinform..

[14]  Rainer Merkl,et al.  A Survey of Codon and Amino Acid Frequency Bias in Microbial Genomes Focusing on Translational Efficiency , 2003, Journal of Molecular Evolution.

[15]  H. Matsuda,et al.  Biased biological functions of horizontally transferred genes in prokaryotic genomes , 2004, Nature Genetics.

[16]  Takeshi Itoh,et al.  Acceleration of genomic evolution caused by enhanced mutation rate in endocellular symbionts , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  J. Lake,et al.  Horizontal gene transfer accelerates genome innovation and evolution. , 2003, Molecular biology and evolution.

[18]  W. Doolittle,et al.  Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. , 2006, Genome research.

[19]  J. Roth,et al.  Genomic Flux: Genome Evolution by Gene Loss and Acquisition , 1999 .

[20]  Mark Gerstein,et al.  Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes. , 2002, Nucleic acids research.

[21]  R. Kassen The experimental evolution of specialists, generalists, and the maintenance of diversity , 2002 .

[22]  P. Bork,et al.  Environments shape the nucleotide composition of genomes , 2005, EMBO reports.

[23]  E V Koonin,et al.  Lineage-specific gene expansions in bacterial and archaeal genomes. , 2001, Genome research.

[24]  A. Stoltzfus,et al.  Molecular evolution of the Escherichia coli chromosome. I. Analysis of structure and natural variation in a previously uncharacterized region between trp and tonB. , 1988, Genetics.

[25]  Andrew Rambaut,et al.  Bi-De: an application for simulating phylogenetic processes , 1996, Comput. Appl. Biosci..

[26]  H. Ochman,et al.  Amelioration of Bacterial Genomes: Rates of Change and Exchange , 1997, Journal of Molecular Evolution.

[27]  B. Hall,et al.  Long-branch attraction and the rDNA model of early eukaryotic evolution. , 1999, Molecular biology and evolution.

[28]  H. Ochman,et al.  Molecular archaeology of the Escherichia coli genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[29]  W. Doolittle,et al.  Prokaryotic evolution in light of gene transfer. , 2002, Molecular biology and evolution.

[30]  A. Moya,et al.  Mutational and selective pressures on codon and amino acid usage in Buchnera, endosymbiotic bacteria of aphids. , 2003, Genome research.

[31]  J. Badger,et al.  Analysis of codon usage patterns of bacterial genomes using the self-organizing map. , 2001, Molecular biology and evolution.

[32]  Stephen Jay Gould,et al.  Stochastic Models of Phylogeny and the Evolution of Diversity , 1973, The Journal of Geology.

[33]  E. Koonin,et al.  Birth and death of protein domains: A simple model of evolution explains power law behavior , 2002, BMC Evolutionary Biology.

[34]  R Milkman,et al.  Molecular evolution of the Escherichia coli chromosome. III. Clonal frames. , 1990, Genetics.

[35]  J. Townsend,et al.  Horizontal gene transfer, genome innovation and evolution , 2005, Nature Reviews Microbiology.

[36]  J. Parkhill,et al.  Comparative genomic structure of prokaryotes. , 2004, Annual review of genetics.

[37]  R. L. Charlebois Organization of the Prokaryotic Genome , 1999 .

[38]  Andrew D. Smith,et al.  SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution , 2005, BMC Bioinformatics.

[39]  P. Lio’,et al.  Models of molecular evolution and phylogeny. , 1998, Genome research.

[40]  D. Futuyma,et al.  The Evolution of Ecological Specialization , 1988 .

[41]  Timothy J. Harlow,et al.  Do different surrogate methods detect lateral genetic transfer events of different relative ages? , 2006, Trends in microbiology.

[42]  Eugene V. Koonin,et al.  Simple stochastic birth andz death models of genome evolution: was there enough time for us to evolve? , 2003, Bioinform..

[43]  Faisal Ababneh,et al.  Hetero: a program to simulate the evolution of DNA on a four-taxon tree. , 2003, Applied bioinformatics.

[44]  Stephen Jay Gould,et al.  The shape of evolution: a comparison of real and random clades , 1977, Paleobiology.

[45]  Eugene V Koonin,et al.  Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models , 2004, BMC Evolutionary Biology.

[46]  Folker Meyer,et al.  Rose: generating sequence families , 1998, Bioinform..

[47]  Rajeev K. Azad,et al.  Use of Artificial Genomes in Assessing Methods for Atypical Gene Detection , 2005, PLoS Comput. Biol..

[48]  Arne Ø. Mooers,et al.  Inferring Evolutionary Process from Phylogenetic Tree Shape , 1997, The Quarterly Review of Biology.

[49]  N. Moran Accelerated evolution and Muller's rachet in endosymbiotic bacteria. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[50]  David L. Swofford,et al.  Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics , 1997, Journal of Mammalian Evolution.

[51]  G. Singer,et al.  Nucleotide bias causes a genomewide bias in the amino acid composition of proteins. , 2000, Molecular biology and evolution.

[52]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[53]  M. Ragan Detection of lateral gene transfer among microbial genomes. , 2001, Current opinion in genetics & development.

[54]  N. Goldman,et al.  Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. , 1994, Molecular biology and evolution.

[55]  C. Cambillau,et al.  Structural and Genomic Correlates of Hyperthermostability* , 2000, The Journal of Biological Chemistry.

[56]  Arlin Stoltzfus,et al.  The Exchangeability of Amino Acids in Proteins , 2005, Genetics.

[57]  G. Singer,et al.  Genomic and proteomic adaptations to growth at high temperature , 2004, Genome Biology.

[58]  J. Lake,et al.  Horizontal gene transfer among genomes: the complexity hypothesis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Alessandro Panconesi,et al.  Ancestral Maximum Likelihood of Evolutionary Trees Is Hard , 2003, WABI.

[60]  R. Ree DETECTING THE HISTORICAL SIGNATURE OF KEY INNOVATIONS USING STOCHASTIC MODELS OF CHARACTER EVOLUTION AND CLADOGENESIS , 2005, Evolution; international journal of organic evolution.

[61]  C. Ouzounis,et al.  The balance of driving forces during genome evolution in prokaryotes. , 2003, Genome research.

[62]  A. Roger,et al.  Impact of taxon sampling on the estimation of rates of evolution at sites. , 2005, Molecular biology and evolution.

[63]  R. Doolittle,et al.  Evolutionary anomalies among the aminoacyl-tRNA synthetases. , 1998, Current opinion in genetics & development.

[64]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[65]  R Nussinov,et al.  Point mutations and sequence variability in proteins: Redistributions of preexisting populations , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[66]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[67]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[68]  Daniel L Rabosky,et al.  LIKELIHOOD METHODS FOR DETECTING TEMPORAL SHIFTS IN DIVERSIFICATION RATES , 2006, Evolution; international journal of organic evolution.

[69]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[70]  D. Kendall On the Generalized "Birth-and-Death" Process , 1948 .