Phylogenetic Model Choice: Justifying a Species Tree or Concatenation Analysis

There are two paradigms for the phylogenetic analysis of multi-locus sequence data: one which forces all genes to share the same underlying history, and another that allows genes to follow idiosyncratic patterns of descent from ancestral alleles. The first of these approaches (concatenation) is clearly a simplified model of the actual process of genome evolution while the second (species-tree methods) may be overly complex for histories characterized by long divergence times between cladogenesis. Rather than making an a priori determination concerning which of these phylogenetic models to apply to our data, we seek to provide a framework for choosing between concatenation and species-tree methods that treat genes as independently evolving lineages. We demonstrate that parametric bootstrapping can be used to assess the extent to which genealogical incongruence across loci can be attributed to phylogenetic estimation error, and demonstrate the application of our approach using an empirical dataset from 10 species of the Natricine snake sub-family. Since our data exhibit incongruence across loci that are clearly caused by a mixture of coalescent stochasticity and phyogenetic estimation error, we also develop an approach for choosing among species tree estimation methods that take gene trees as input and those that simultaneously estimate gene trees and species trees.

[1]  H. Ellegren,et al.  Sampling strategies for species trees: the effects on phylogenetic inference of the number of genes, number of individuals, and whether loci are mitochondrial, sex-linked, or autosomal. , 2013, Molecular phylogenetics and evolution.

[2]  Víctor Soria-Carrasco,et al.  The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees , 2007, Bioinform..

[3]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[4]  Bernard M. E. Moret,et al.  Phylogenetic Inference , 2011, Encyclopedia of Parallel Computing.

[5]  A. Leaché Species tree discordance traces to phylogeographic clade boundaries in North American fence lizards (Sceloporus). , 2009, Systematic biology.

[6]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[7]  G. Simpson Life Of The Past , 1968 .

[8]  A. Peterson,et al.  CALIBRATING DIVERGENCE TIMES ON SPECIES TREES VERSUS GENE TREES: IMPLICATIONS FOR SPECIATION HISTORY OF APHELOCOMA JAYS , 2011, Evolution; international journal of organic evolution.

[9]  Zaid Abdo,et al.  Performance-based selection of likelihood models for phylogeny estimation. , 2003, Systematic biology.

[10]  Bryan C. Carstens,et al.  Rapid and accurate species tree estimation for phylogeographic investigations using replicated subsampling. , 2010, Molecular phylogenetics and evolution.

[11]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[12]  K. Mebert Good species despite massive hybridization: genetic research on the contact zone between the watersnakes Nerodia sipedon and N. fasciata in the Carolinas, USA , 2008, Molecular ecology.

[13]  S. Edwards,et al.  Comparison of species tree methods for reconstructing the phylogeny of bearded manakins (Aves: Pipridae, Manacus) from multilocus sequence data. , 2008, Systematic biology.

[14]  Scott V Edwards,et al.  Coalescent methods for estimating phylogenetic trees. , 2009, Molecular phylogenetics and evolution.

[15]  R. Lawson,et al.  Phylogenetic relationships of North American garter snakes (Thamnophis) based on four mitochondrial genes: how much DNA sequence is enough? , 2002, Molecular phylogenetics and evolution.

[16]  J. Sullivan,et al.  Comparative Phylogeography of Mesoamerican Highland Rodents: Concerted versus Independent Response to Past Climatic Fluctuations , 2000, The American Naturalist.

[17]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[18]  A. Kluge A Concern for Evidence and a Phylogenetic Hypothesis of Relationships among Epicrates (Boidae, Serpentes) , 1989 .

[19]  M. Alfaro Sweeping and striking: a kinematic study of the trunk during prey capture in three thamnophiine snakes , 2003, Journal of Experimental Biology.

[20]  G. Ortí,et al.  Molecular phylogeny of Clupeiformes (Actinopterygii) inferred from nuclear and mitochondrial DNA sequences. , 2007, Molecular phylogenetics and evolution.

[21]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[22]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[23]  Bryan C. Carstens,et al.  Delimiting species without monophyletic gene trees. , 2007, Systematic biology.

[24]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[25]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[26]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[27]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[28]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[29]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[30]  R. Hudson,et al.  MATHEMATICAL CONSEQUENCES OF THE GENEALOGICAL SPECIES CONCEPT , 2002, Evolution; international journal of organic evolution.

[31]  J. Huelsenbeck,et al.  Application and accuracy of molecular phylogenies. , 1994, Science.

[32]  G. Burghardt,et al.  Distinctiveness in the face of gene flow: hybridization between specialist and generalist gartersnakes , 2008, Molecular ecology.

[33]  Laura Salter Kubatko,et al.  Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. , 2009, Theoretical population biology.

[34]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[35]  M. Dorcas,et al.  North American Watersnakes: A Natural History , 2004 .

[36]  T. Glenn,et al.  Isolating microsatellite DNA loci. , 2005, Methods in enzymology.

[37]  A. Savitzky North American Watersnakes: A Natural History , 2005 .

[38]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[39]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[40]  N. Ferrand,et al.  The limits of mtDNA phylogeography: complex patterns of population history in a highly structured Iberian lizard are only revealed by the use of nuclear markers , 2008, Molecular ecology.

[41]  Bryan C Carstens,et al.  Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. , 2007, Systematic biology.

[42]  Schopf Jw Disparate rates, differing fates: tempo and mode of evolution changed from the Precambrian to the Phanerozoic , 1994 .

[43]  H. Zaher,et al.  A Cretaceous terrestrial snake with robust hindlimbs and a sacrum , 2006, Nature.

[44]  S. Evans At the feet of the dinosaurs: the early history and radiation of lizards , 2003, Biological reviews of the Cambridge Philosophical Society.

[45]  Caitlin A. Kuczynski,et al.  Branch lengths, support, and congruence: testing the phylogenomic approach with 20 nuclear loci in snakes. , 2008, Systematic biology.

[46]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[47]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[48]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[49]  Liang Liu,et al.  BEST: Bayesian estimation of species trees under the coalescent model , 2008, Bioinform..

[50]  J. Holman Fossil snakes of North America : origin, evolution, distribution, paleoecology , 2000 .

[51]  Bryan C. Carstens,et al.  Shifting distributions and speciation: species divergence during rapid climate change , 2006, Molecular ecology.

[52]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[53]  Matthew G. King,et al.  Discordance between phylogenetics and coalescent‐based divergence modelling: exploring phylogeographic patterns of speciation in the Carex macrocephala species complex , 2009, Molecular ecology.

[54]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[55]  S. J. Arnold,et al.  Molecular systematics and evolution of Regina and the thamnophiine snakes. , 2001, Molecular phylogenetics and evolution.

[56]  David C. Jones,et al.  Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. , 1996, Journal of molecular biology.

[57]  J. Schopf Disparate rates, differing fates: tempo and mode of evolution changed from the Precambrian to the Phanerozoic. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Tal Pupko,et al.  Rodent phylogeny revised: analysis of six nuclear genes from all major rodent clades , 2009, BMC Evolutionary Biology.

[59]  John E McCormack,et al.  Maximum likelihood estimates of species trees: how accuracy of phylogenetic inference depends upon the divergence history and sampling design. , 2009, Systematic biology.