Unifying vertical and nonvertical evolution: a stochastic ARG-based framework.

Evolutionary biologists have introduced numerous statistical approaches to explore nonvertical evolution, such as horizontal gene transfer, recombination, and genomic reassortment, through collections of Markov-dependent gene trees. These tree collections allow for inference of nonvertical evolution, but only indirectly, making findings difficult to interpret and models difficult to generalize. An alternative approach to explore nonvertical evolution relies on phylogenetic networks. These networks provide a framework to model nonvertical evolution but leave unanswered questions such as the statistical significance of specific nonvertical events. In this paper, we begin to correct the shortcomings of both approaches by introducing the "stochastic model for reassortment and transfer events" (SMARTIE) drawing upon ancestral recombination graphs (ARGs). ARGs are directed graphs that allow for formal probabilistic inference on vertical speciation events and nonvertical evolutionary events. We apply SMARTIE to phylogenetic data. Because of this, we can typically infer a single most probable ARG, avoiding coarse population dynamic summary statistics. In addition, a focus on phylogenetic data suggests novel probability distributions on ARGs. To make inference with our model, we develop a reversible jump Markov chain Monte Carlo sampler to approximate the posterior distribution of SMARTIE. Using the BEAST phylogenetic software as a foundation, the sampler employs a parallel computing approach that allows for inference on large-scale data sets. To demonstrate SMARTIE, we explore 2 separate phylogenetic applications, one involving pathogenic Leptospirochete and the other Saccharomyces.

[1]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[2]  M. Suchard,et al.  Models for Estimating Bayes Factors with Applications to Phylogeny and Tests of Monophyly , 2005, Biometrics.

[3]  Mark A Ragan,et al.  Detecting lateral genetic transfer : a phylogenetic approach. , 2008, Methods in molecular biology.

[4]  R. Hudson Two-locus sampling distributions and their application. , 2001, Genetics.

[5]  Olivier Tenaillon,et al.  Contribution of Recombination to the Evolution of Human Immunodeficiency Viruses Expressing Resistance to Antiretroviral Treatment , 2007, Journal of Virology.

[6]  N. Cox,et al.  Adamantane resistance among influenza A viruses isolated early during the 2005-2006 influenza season in the United States. , 2006, JAMA.

[7]  N. Cox,et al.  Genetic analysis of human H2N2 and early H3N2 influenza viruses, 1957-1972: evidence for genetic divergence and multiple reassortment events. , 2004, Virology.

[8]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[9]  Charles Semple,et al.  A Framework for Representing Reticulate Evolution , 2005 .

[10]  Noah A Rosenberg,et al.  The probability of topological concordance of gene trees and species trees. , 2002, Theoretical population biology.

[11]  P. Donnelly,et al.  Estimating recombination rates from population genetic data. , 2001, Genetics.

[12]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[13]  Peter Beerli,et al.  Likelihoods on coalescents: a Monte Carlo sampling approach to inferring parameters from population samples of molecular data , 1999 .

[14]  Thomas M. Keane,et al.  DPRml: distributed phylogeny reconstruction by maximum likelihood , 2005, Bioinform..

[15]  J. Verhoef,et al.  Evidence of extensive interspecies transfer of integron-mediated antimicrobial resistance genes among multidrug-resistant Enterobacteriaceae in a clinical setting. , 2002, The Journal of infectious diseases.

[16]  Marco Danelutto,et al.  Euro-Par 2004 Parallel Processing , 2004, Lecture Notes in Computer Science.

[17]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[18]  Feng-Chi Chen,et al.  Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. , 2001, American journal of human genetics.

[19]  Xizhou Feng,et al.  Parallel algorithms for Bayesian phylogenetic inference , 2003, J. Parallel Distributed Comput..

[20]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[21]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[22]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[23]  B. Rannala,et al.  Phylogenetic inference using whole genomes. , 2008, Annual review of genomics and human genetics.

[24]  S. Carroll,et al.  More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. , 2005, Molecular biology and evolution.

[25]  Yun S. Song,et al.  Importance sampling and the two-locus model with subdivided population structure , 2008, Advances in Applied Probability.

[26]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[27]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[28]  P. Zipfel,et al.  LfhA, a Novel Factor H-Binding Protein of Leptospira interrogans , 2006, Infection and Immunity.

[29]  J. Wall,et al.  A comparison of estimators of the population recombination rate. , 2000, Molecular biology and evolution.

[30]  L. Pauling,et al.  Evolutionary Divergence and Convergence in Proteins , 1965 .

[31]  Hani Doss,et al.  Phylogenetic Tree Construction Using Markov Chain Monte Carlo , 2000 .

[32]  P. Donnelly,et al.  Comparison of Fine-Scale Recombination Rates in Humans and Chimpanzees , 2005, Science.

[33]  Albert Y. Zomaya,et al.  Parallel implementation of maximum likelihood methods for phylogenetic analysis , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[34]  M. Suchard,et al.  StepBrothers: inferring partially shared ancestries among recombinant viral sequences. , 2008, Biostatistics.

[35]  Vincent Moulton,et al.  Using supernetworks to distinguish hybridization from lineage-sorting , 2008, BMC Evolutionary Biology.

[36]  K. Crandall,et al.  The causes and consequences of HIV evolution , 2004, Nature Reviews Genetics.

[37]  Donald K. Berry,et al.  Parallel Implementation and Performance of FastDNAml - A Program for Maximum Likelihood Phylogenetic Inference , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[38]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[39]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.

[40]  K. Crandall,et al.  Recombination in evolutionary genomics. , 2002, Annual review of genetics.

[41]  M. Bordewich,et al.  Computing the Hybridization Number of Two Phylogenetic Trees Is Fixed-Parameter Tractable , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[42]  D. Husmeier,et al.  Detecting recombination in 4-taxa DNA sequence alignments with Bayesian hidden Markov models and Markov chain Monte Carlo. , 2003, Molecular biology and evolution.

[43]  N. Grishin,et al.  Genome trees and the tree of life. , 2002, Trends in genetics : TIG.

[44]  Eric Bapteste,et al.  INAUGURAL ARTICLE by a Recently Elected Academy Member:Pattern pluralism and the Tree of Life hypothesis , 2007 .

[45]  Bret Larget,et al.  Introduction to Markov Chain Monte Carlo Methods in Molecular Evolution , 2005 .

[46]  John Wakeley,et al.  Coalescent Processes When the Distribution of Offspring Number Among Individuals Is Highly Skewed , 2006, Genetics.

[47]  Luay Nakhleh,et al.  Confounding Factors in HGT Detection: Statistical Error, Coalescent Effects, and Multiple Solutions , 2007, J. Comput. Biol..

[48]  Marc A. Suchard,et al.  Many-core algorithms for statistical phylogenetics , 2009, Bioinform..

[49]  Bryan C. Carstens,et al.  Delimiting species without monophyletic gene trees. , 2007, Systematic biology.

[50]  Effrey,et al.  Divergence Time and Evolutionary Rate Estimation with Multilocus Data , 2002 .

[51]  Mark A. Miller,et al.  The genesis and spread of reassortment human influenza A/H3N2 viruses conferring adamantane resistance. , 2007, Molecular biology and evolution.

[52]  P A MORAN,et al.  A general theory of the distribution of gene frequencies - I. Overlapping generations , 1958, Proceedings of the Royal Society of London. Series B - Biological Sciences.

[53]  P. Donnelly,et al.  Approximate likelihood methods for estimating local recombination rates , 2002 .

[54]  S. Marca,et al.  Inferring Spatial Phylogenetic Variation Along Nucleotide Sequences : A Multiple Changepoint Model , 2003 .

[55]  Li-Jung Liang,et al.  A Hierarchical Semiparametric Regression Model for Combining HIV‐1 Phylogenetic Analyses Using Iterative Reweighting Algorithms , 2007, Biometrics.

[56]  G. McVean,et al.  Estimating recombination rates from population-genetic data , 2003, Nature Reviews Genetics.

[57]  Thomas Ludwig,et al.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees , 2005, Bioinform..

[58]  Daniel J. Wilson,et al.  Rapid Evolution and the Importance of Recombination to the Gastroenteric Pathogen Campylobacter jejuni , 2008, Molecular biology and evolution.

[59]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[60]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[61]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[62]  Stéphane Aris-Brosou,et al.  Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA phylogeny. , 2002, Systematic biology.

[63]  David A. Bader,et al.  High-Performance Algorithm Engineering for Computational Phylogenetics , 2001, The Journal of Supercomputing.

[64]  V. Bryson,et al.  Evolving Genes and Proteins. , 1965, Science.

[65]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[66]  Gianni Liti,et al.  Sequence Diversity, Reproductive Isolation and Species Concepts in Saccharomyces , 2006, Genetics.

[67]  M. Willig,et al.  Leptospirosis: a zoonotic disease of global importance. , 2003, The Lancet. Infectious diseases.

[68]  Simon Whelan,et al.  Statistical Methods in Molecular Evolution , 2005 .

[69]  J. Lagergren,et al.  Simultaneous Bayesian gene tree reconstruction and reconciliation analysis , 2009, Proceedings of the National Academy of Sciences.

[70]  Kaizhong Zhang,et al.  Perfect phylogenetic networks with recombination , 2001, J. Comput. Biol..

[71]  G. McVean,et al.  Approximating the coalescent with recombination , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[72]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[73]  Xizhou Feng,et al.  Building the Tree of Life on Terascale Systems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[74]  H. Temin Sex and recombination in retroviruses. , 1991, Trends in genetics : TIG.

[75]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[76]  Laura Salter Kubatko,et al.  Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. , 2009, Theoretical population biology.

[77]  P. Moran,et al.  A general theory of the distribution of gene frequencies - II. Non-overlapping generations , 1958, Proceedings of the Royal Society of London. Series B - Biological Sciences.

[78]  J. Mallet Hybrid speciation , 2007, Nature.

[79]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[80]  Arndt von Haeseler,et al.  pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies , 2005, Bioinform..

[81]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[82]  Yun S. Song,et al.  Accurate Computation of Likelihoods in the Coalescent with Recombination Via Parsimony , 2008, RECOMB.

[83]  P. Marjoram,et al.  Ancestral Inference from Samples of DNA Sequences with Recombination , 1996, J. Comput. Biol..

[84]  Loren H Rieseberg,et al.  Reconstructing patterns of reticulate evolution in plants. , 2004, American journal of botany.

[85]  J. Andersson,et al.  Phylogenetic Analyses of Diplomonad Genes Reveal Frequent Lateral Gene Transfers Affecting Eukaryotes , 2003, Current Biology.

[86]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[87]  Yun-Xin Fu Exact coalescent for the Wright-Fisher model. , 2006, Theoretical population biology.

[88]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[89]  M. Suchard,et al.  Hierarchical phylogenetic models for analyzing multipartite sequence data. , 2003, Systematic biology.

[90]  Jonathan M. Keith,et al.  Bioinformatics: Volume I: Data, Sequence Analysis, and Evolution , 2008 .

[91]  T. Tuller,et al.  Inferring phylogenetic networks by the maximum parsimony criterion: a case study. , 2006, Molecular biology and evolution.

[92]  Sagi Snir,et al.  Maximum likelihood of phylogenetic networks , 2006, Bioinform..

[93]  T. J. Robinson,et al.  Hemiplasy: a new term in the lexicon of phylogenetics. , 2008, Systematic biology.

[94]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[95]  S. Salzberg,et al.  Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima , 1999, Nature.

[96]  A. Kluge A Concern for Evidence and a Phylogenetic Hypothesis of Relationships among Epicrates (Boidae, Serpentes) , 1989 .

[97]  D. Baum Concordance trees, concordance factors, and the exploration of reticulate genealogy , 2007 .

[98]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[99]  M. Meselson,et al.  Massive Horizontal Gene Transfer in Bdelloid Rotifers , 2008, Science.

[100]  Dan Gusfield,et al.  Optimal, Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination , 2004, J. Bioinform. Comput. Biol..

[101]  J. Hein A heuristic method to reconstruct the history of sequences subject to recombination , 1993, Journal of Molecular Evolution.

[102]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[103]  Anthony S. Wojcik,et al.  Afips Conference Proceedings , 1985 .

[104]  P. Donnelly,et al.  Inference in molecular population genetics , 2000 .

[105]  Marc A Suchard,et al.  A nonparametric method for accommodating and testing across-site rate variation. , 2007, Systematic biology.

[106]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[107]  R. Hudson,et al.  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. , 1985, Genetics.

[108]  E. Koonin Darwinian evolution in the light of genomics , 2008, Nucleic acids research.

[109]  Thomas Ludwig,et al.  Parallel Inference of a 10.000-Taxon Phylogeny with Maximum Likelihood , 2004, Euro-Par.

[110]  K. Crandall,et al.  A Comparison of Phylogenetic Network Methods Using Computer Simulation , 2008, PloS one.

[111]  M. Suchard,et al.  Leptospira interrogans Endostatin-Like Outer Membrane Proteins Bind Host Fibronectin, Laminin and Regulators of Complement , 2007, PloS one.

[112]  M. Telford Phylogenomics , 2007, Current Biology.

[113]  James R. Brown Ancient horizontal gene transfer , 2003, Nature Reviews Genetics.

[114]  E. Holmes,et al.  A likelihood method for the detection of selection and recombination using nucleotide sequences. , 1997, Molecular biology and evolution.

[115]  Brian D. Farrell,et al.  Comparison of methods for species-tree inference in the sawfly genus Neodiprion (Hymenoptera: Diprionidae). , 2008, Systematic biology.

[116]  Yun S. Song,et al.  Constructing Minimal Ancestral Recombination Graphs , 2005, J. Comput. Biol..

[117]  Gabriel Cardona,et al.  Extended Newick: it is time for a standard representation of phylogenetic networks , 2008, BMC Bioinformatics.

[118]  Vladimir N. Minin,et al.  Dual multiple change-point model leads to more accurate recombination detection , 2005, Bioinform..

[119]  Martin Enserink,et al.  A 'Wimpy' Flu Strain Mysteriously Turns Scary , 2009, Science.

[120]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[121]  John Wakeley,et al.  The limits of theoretical population genetics. , 2005, Genetics.

[122]  V Moulton,et al.  Recombination analysis using directed graphical models. , 2001, Molecular biology and evolution.

[123]  P Donnelly,et al.  Coalescents and genealogical structure under neutrality. , 1995, Annual review of genetics.

[124]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[125]  C. Simon,et al.  Differentiating between hypotheses of lineage sorting and introgression in New Zealand alpine cicadas (Maoricicada Dugdale). , 2006, Systematic biology.

[126]  P. Donnelly,et al.  The Fine-Scale Structure of Recombination Rate Variation in the Human Genome , 2004, Science.

[127]  B. Larget,et al.  Bayesian estimation of concordance among gene trees. , 2006, Molecular biology and evolution.

[128]  E. Holmes,et al.  The evolution of epidemic influenza , 2007, Nature Reviews Genetics.

[129]  Jon A Yamato,et al.  Maximum likelihood estimation of recombination rates from population data. , 2000, Genetics.

[130]  D. Penny,et al.  The modern molecular clock , 2003, Nature Reviews Genetics.

[131]  R. Nielsen Estimation of population parameters and recombination rates from single nucleotide polymorphisms. , 2000, Genetics.

[132]  V. Moulton,et al.  Exploring contradictory phylogenetic relationships in yeasts. , 2008, FEMS yeast research.

[133]  D. Falush,et al.  Inference of Bacterial Microevolution Using Multilocus Sequence Data , 2007, Genetics.

[134]  R. Punnett,et al.  The Genetical Theory of Natural Selection , 1930, Nature.

[135]  H. Ochman,et al.  Molecular archaeology of the Escherichia coli genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[136]  M. De Iorio,et al.  Importance sampling on coalescent histories. I , 2004, Advances in Applied Probability.

[137]  B. Rannala,et al.  Bayesian inference of fine-scale recombination rates using population genomic data , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[138]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[139]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .