Computer Programs and Methodologies for the Simulation of DNA Sequence Data with Recombination

Computer simulations are useful in evolutionary biology for hypothesis testing, to verify analytical methods, to analyze interactions among evolutionary processes, and to estimate evolutionary parameters. In particular, the simulation of DNA sequences with recombination may help in understanding the role of recombination in diverse evolutionary questions, such as the genome structure. Consequently, plenty of computer simulators have been developed to simulate DNA sequence data with recombination. However, the choice of an appropriate tool, among all currently available simulators, is critical if recombination simulations are to be biologically meaningful. This review provides a practical survival guide to commonly used computer programs and methodologies for the simulation of coding and non-coding DNA sequences with recombination. It may help in the correct design of computer simulation experiments of recombination. In addition, the study includes a review of simulation studies investigating the impact of ignoring recombination when performing various evolutionary analyses, such as phylogenetic tree and ancestral sequence reconstructions. Alternative analytical methodologies accounting for recombination are also reviewed.

[1]  Marek Kimmel,et al.  Forward-Time Simulations of Human Populations with Complex Diseases , 2007, PLoS genetics.

[2]  Gregory Ewing,et al.  MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus , 2010, Bioinform..

[3]  M. Nordborg,et al.  Coalescent Theory , 2019, Handbook of Statistical Genomics.

[4]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[5]  Carsten Wiuf,et al.  A coalescent model of recombination hotspots. , 2003, Genetics.

[6]  C. Fraser,et al.  Recombination and the Nature of Bacterial Speciation , 2007, Science.

[7]  E. Holmes,et al.  Evolutionary aspects of recombination in RNA viruses. , 1999, The Journal of general virology.

[8]  K. Crandall,et al.  Evaluation of methods for detecting recombination from DNA sequences: Computer simulations , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  J. Hein,et al.  A simulation study of the reliability of recombination detection methods. , 2001, Molecular biology and evolution.

[10]  Anastasios D. Tsaousis,et al.  Widespread recombination in published animal mtDNA sequences. , 2005, Molecular biology and evolution.

[11]  P. Sharp,et al.  Recombination in HIV-1 , 1995, Nature.

[12]  Marie-Josée Fortin,et al.  Utility of computer simulations in landscape genetics , 2010, Molecular ecology.

[13]  M. Pérez‐Losada,et al.  Phylodynamics of HIV-1 from a Phase III AIDS Vaccine Trial in Bangkok, Thailand , 2011, PloS one.

[14]  Thomas Lengauer,et al.  Positive selection of HIV host factors and the evolution of lentivirus genes , 2010, BMC Evolutionary Biology.

[15]  J. Hein,et al.  Recombination and the molecular clock. , 2000, Molecular biology and evolution.

[16]  David Posada,et al.  Simulation of coding sequence evolution , 2012 .

[17]  J. Plotkin,et al.  The Population Genetics of dN/dS , 2008, PLoS genetics.

[18]  M. Carmen Villaverde,et al.  Prediction and analysis of binding affinities for chemically diverse HIV‐1 PR inhibitors by the modified SAFE_p approach , 2009, J. Comput. Chem..

[19]  D. Posada,et al.  Unveiling the molecular clock in the presence of recombination. , 2001, Molecular biology and evolution.

[20]  Antonio Carvajal-Rodríguez,et al.  GENOMEPOP: A program to simulate genomes in populations , 2008, BMC Bioinformatics.

[21]  Hideki Innan,et al.  mbs: modifying Hudson's ms software to generate samples of DNA sequences with a biallelic site under selection , 2009, BMC Bioinformatics.

[22]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[23]  Ryan D. Hernandez,et al.  A flexible forward simulator for populations subject to selection and demography , 2008, Bioinform..

[24]  R. Plevin,et al.  Approximate Bayesian Computation in Evolution and Ecology , 2011 .

[25]  M. Pérez‐Losada,et al.  Ethnic differences in the adaptation rate of HIV gp120 from a vaccine trial , 2009, Retrovirology.

[26]  D. Posada,et al.  Coalescent Simulation of Intracodon Recombination , 2010, Genetics.

[27]  P. Awadalla The evolutionary genomics of pathogen recombination , 2003, Nature Reviews Genetics.

[28]  Laurent Excoffier,et al.  Fastsimcoal: a Continuous-time Coalescent Simulator of Genomic Diversity under Arbitrarily Complex Evolutionary Scenarios , 2011, Bioinform..

[29]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[30]  Johan A. Grahnen,et al.  Biophysical and structural considerations for protein sequence evolution , 2011, BMC Evolutionary Biology.

[31]  L. Excoffier,et al.  Consequences of range contractions and range shifts on molecular diversity. , 2012, Molecular biology and evolution.

[32]  Francesc Calafell,et al.  Haplotype Evolution and Linkage Disequilibrium: A Simulation Study , 2000, Human Heredity.

[33]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[34]  D. Balding,et al.  Statistical Applications in Genetics and Molecular Biology On Optimal Selection of Summary Statistics for Approximate Bayesian Computation , 2011 .

[35]  G. B. Golding,et al.  "Patchy-tachy" leads to false positives for recombination. , 2011, Molecular biology and evolution.

[36]  K. Crandall,et al.  The Effect of Recombination on the Accuracy of Phylogeny Estimation , 2002, Journal of Molecular Evolution.

[37]  J. Dvorak,et al.  Recombination: an underappreciated factor in the evolution of plant genomes , 2007, Nature Reviews Genetics.

[38]  Ziheng Yang,et al.  INDELible: A Flexible Simulator of Biological Sequence Evolution , 2009, Molecular biology and evolution.

[39]  Gonçalo R. Abecasis,et al.  GENOME: a rapid coalescent-based whole genome simulator , 2007, Bioinform..

[40]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[41]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[42]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[43]  Michele Vendruscolo,et al.  Structural Approaches to Sequence Evolution , 2007 .

[44]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[45]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[46]  Cory L. Strope,et al.  Biological Sequence Simulation for Testing Complex Evolutionary Hypotheses: indel-Seq-Gen Version 2.0 , 2009, Molecular biology and evolution.

[47]  D. Posada,et al.  Characterization of Reticulate Networks Based on the Coalescent with Recombination , 2008, Molecular biology and evolution.

[48]  Daniel H. Huson,et al.  SplitsTree: analyzing and visualizing evolutionary data , 1998, Bioinform..

[49]  Giri Narasimhan,et al.  Serial NetEvolve: a flexible utility for generating serially-sampled sequences along a tree or recombinant network , 2006, Bioinform..

[50]  Cheng-qiang He,et al.  Intragenic Recombination as a Mechanism of Genetic Diversity in Bluetongue Virus , 2010, Journal of Virology.

[51]  K. Crandall,et al.  A Comparison of Phylogenetic Network Methods Using Computer Simulation , 2008, PloS one.

[52]  J. Corander,et al.  Detection of recombination events in bacterial genomes from large population samples , 2011, Nucleic acids research.

[53]  Laurent Excoffier,et al.  SIMCOAL 2.0: a program to simulate genomic diversity over large recombining regions in a subdivided population with a complex history , 2004, Bioinform..

[54]  P. Lemey,et al.  Analysing recombination in nucleotide sequences , 2011, Molecular ecology resources.

[55]  Marek Kimmel,et al.  simuPOP: a forward-time population genetics simulation environment , 2005, Bioinform..

[56]  D. Posada,et al.  The Effect of Recombination on the Reconstruction of Ancestral Sequences , 2010, Genetics.

[57]  Garrett Hellenthal,et al.  msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspots , 2007, Bioinform..

[58]  P B Herdson,et al.  Ethnic differences. , 1990, The New Zealand medical journal.

[59]  Daniel J. Wilson,et al.  Rapid Evolution and the Importance of Recombination to the Gastroenteric Pathogen Campylobacter jejuni , 2008, Molecular biology and evolution.

[60]  R. Redfield,et al.  Tracing the Evolution of Competence in Haemophilus influenzae , 2009, PloS one.

[61]  Sergei L. Kosakovsky Pond,et al.  Datamonkey: rapid detection of selective pressure on individual sites of codon alignments , 2005, Bioinform..

[62]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[63]  Peter Donnelly,et al.  The Influence of Recombination on Human Genetic Diversity , 2006, PLoS genetics.

[64]  Thomas Mitchell-Olds,et al.  Mlcoalsim: Multilocus Coalescent Simulations , 2007, Evolutionary bioinformatics online.

[65]  M. Pérez‐Losada,et al.  Phylodynamics of HIV-1 from a Phase-III AIDS Vaccine Trial in North America , 2009, Molecular biology and evolution.

[66]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[67]  R. Nielsen,et al.  Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. , 2003, Genetics.

[68]  Paul Marjoram,et al.  Exploring Population Genetic Models With Recombination Using Efficient Forward-Time Simulations , 2008, Genetics.

[69]  A. Lukashev Role of recombination in evolution of enteroviruses , 2005, Reviews in medical virology.

[70]  M. Weale,et al.  A survey of genetic simulation software for population and epidemiological studies , 2008, Human Genomics.

[71]  D. Posada,et al.  Computational design of centralized HIV-1 genes. , 2010, Current HIV research.

[72]  David Posada,et al.  Automated phylogenetic detection of recombination using a genetic algorithm. , 2006, Molecular biology and evolution.

[73]  Pardis C Sabeti,et al.  Linkage disequilibrium in the human genome , 2001, Nature.

[74]  A. Arachiche,et al.  Mutation Rate Switch inside Eurasian Mitochondrial Haplogroups: Impact of Selection and Consequences for Dating Settlement in Europe , 2011, PloS one.

[75]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[76]  E. Holmes,et al.  Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[77]  S. Peck Simulation as experiment: a philosophical reassessment for biological modeling. , 2004, Trends in ecology & evolution.

[78]  G. McVean,et al.  Approximating the coalescent with recombination , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[79]  A. G. Pedersen,et al.  Computational Molecular Evolution , 2013 .

[80]  Oscar E. Gaggiotti,et al.  Computer simulations: tools for population and evolutionary genetics , 2012, Nature Reviews Genetics.

[81]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[82]  Laurent Duret,et al.  The Impact of Recombination on Nucleotide Substitutions in the Human Genome , 2008, PLoS genetics.

[83]  R. Griffiths,et al.  An ancestral recombination graph , 1997 .

[84]  L. Excoffier,et al.  Influence of admixture and paleolithic range contractions on current European diversity gradients. , 2013, Molecular biology and evolution.

[85]  Gabriel Valiente,et al.  Characterization of phylogenetic networks with NetTest , 2010, BMC Bioinformatics.

[86]  Miguel Arenas,et al.  Simulation of Molecular Data under Diverse Evolutionary Scenarios , 2012, PLoS Comput. Biol..

[87]  A. Jetzt,et al.  Human Immunodeficiency Virus Type 1 Recombination: Rate, Fidelity, and Putative Hot Spots , 2002, Journal of Virology.

[88]  Mateus Patricio,et al.  Genome-Wide Heterogeneity of Nucleotide Substitution Model Fit , 2011, Genome biology and evolution.

[89]  J. Hein,et al.  Consequences of recombination on traditional phylogenetic analysis. , 2000, Genetics.

[90]  Robert L Charlebois,et al.  The Impact of Reticulate Evolution on Genome Phylogeny , 2008 .

[91]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[92]  K. Crandall,et al.  Recombination in evolutionary genomics. , 2002, Annual review of genetics.

[93]  W. Stemmer,et al.  Genome shuffling leads to rapid phenotypic improvement in bacteria , 2002, Nature.

[94]  Oscar Westesson,et al.  Accurate Detection of Recombinant Breakpoints in Whole-Genome Alignments , 2009, PLoS Comput. Biol..

[95]  David Posada,et al.  Recodon: Coalescent simulation of coding DNA sequences with recombination, migration and demography , 2007, BMC Bioinformatics.

[96]  E. Xing,et al.  Robust Estimation of Local Genetic Ancestry in Admixed Populations Using a Nonparametric Bayesian Approach , 2012, Genetics.

[97]  Nicolas Ray,et al.  SPLATCHE2: a spatially explicit simulation framework for complex demography, genetic admixture and recombination , 2010, Bioinform..