Bayesian Inference of Species Trees from Multilocus Data

Until recently, it has been common practice for a phylogenetic analysis to use a single gene sequence from a single individual organism as a proxy for an entire species. With technological advances, it is now becoming more common to collect data sets containing multiple gene loci and multiple individuals per species. These data sets often reveal the need to directly model intraspecies polymorphism and incomplete lineage sorting in phylogenetic estimation procedures. For a single species, coalescent theory is widely used in contemporary population genetics to model intraspecific gene trees. Here, we present a Bayesian Markov chain Monte Carlo method for the multispecies coalescent. Our method coestimates multiple gene trees embedded in a shared species tree along with the effective population size of both extant and ancestral species. The inference is made possible by multilocus data from multiple individuals per species. Using a multiindividual data set and a series of simulations of rapid species radiations, we demonstrate the efficacy of our new method. These simulations give some insight into the behavior of the method as a function of sampled individuals, sampled loci, and sequence length. Finally, we compare our new method to both an existing method (BEST 2.2) with similar goals and the supermatrix (concatenation) method. We demonstrate that both BEST and our method have much better estimation accuracy for species tree topology than concatenation, and our method outperforms BEST in divergence time and population size estimation.

[1]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[2]  M. Nei Molecular Evolutionary Genetics , 1987 .

[3]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[4]  A. Meyer Phylogenetic relationships and evolutionary processes in East African cichlid fishes. , 1993, Trends in ecology & evolution.

[5]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[6]  P. Donnelly,et al.  Optimal sequencing strategies for surveying molecular genetic diversity. , 1996, Genetics.

[7]  W. Maddison Gene Trees in Species Trees , 1997 .

[8]  G. Fraser Gregor Mendel—The First Geneticist , 1997 .

[9]  D. Balding,et al.  Genealogical inference from microsatellite data. , 1998, Genetics.

[10]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[11]  J. Hein,et al.  Consequences of recombination on traditional phylogenetic analysis. , 2000, Genetics.

[12]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[13]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[14]  David J. Balding,et al.  Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities , 2003 .

[15]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[16]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[17]  S. Sampling theory for neutral alleles in a varying environment , 2003 .

[18]  W. Stephan,et al.  Inferring the Population Structure and Demography of Drosophila ananassae From Multilocus Data , 2004, Genetics.

[19]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[20]  J. Losos,et al.  Partial island submergence and speciation in an adaptive radiation: a multilocus analysis of the Cuban green anoles , 2004, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[21]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[22]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[23]  J. Felsenstein Accuracy of coalescent likelihood estimates: do we need more sites, more sequences, or more loci? , 2006, Molecular biology and evolution.

[24]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[25]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[26]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[27]  J. Huelsenbeck,et al.  Inference of Population Structure Under a Dirichlet Process Model , 2007, Genetics.

[28]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[29]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[30]  Jody Hey,et al.  Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics , 2007, Proceedings of the National Academy of Sciences.

[31]  Liang Liu,et al.  Estimating Species Trees Using Multiple-Allele DNA Sequence Data , 2008, Evolution; international journal of organic evolution.

[32]  Tanja Gernhard,et al.  The conditioned reconstructed process. , 2008, Journal of theoretical biology.

[33]  C. Moritz,et al.  Multilocus phylogenetics of a rapid radiation in the genus Thomomys (Rodentia: Geomyidae). , 2008, Systematic biology.

[34]  A. Drummond,et al.  Bayesian inference of population size history from multiple loci , 2008, BMC Evolutionary Biology.

[35]  J. Eisen,et al.  A simple, fast, and accurate method of phylogenomic inference , 2008, Genome Biology.

[36]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[37]  John E McCormack,et al.  Maximum likelihood estimates of species trees: how accuracy of phylogenetic inference depends upon the divergence history and sampling design. , 2009, Systematic biology.

[38]  S. Edwards,et al.  Phylogenetic analysis in the anomaly zone. , 2009, Systematic biology.

[39]  L Lacey Knowles,et al.  Estimating species trees: methods of phylogenetic analysis when there is incongruence across genes. , 2009, Systematic biology.

[40]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[41]  D. Pearl,et al.  Estimating species phylogenies using coalescence times among sequences. , 2009, Systematic biology.

[42]  Scott V Edwards,et al.  Coalescent methods for estimating phylogenetic trees. , 2009, Molecular phylogenetics and evolution.

[43]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[44]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[45]  David Bryant,et al.  Properties of consensus methods for inferring species trees from gene trees. , 2008, Systematic biology.

[46]  A. Leaché Species tree discordance traces to phylogeographic clade boundaries in North American fence lizards (Sceloporus). , 2009, Systematic biology.

[47]  L. Knowles,et al.  What is the danger of the anomaly zone for empirical phylogenetics? , 2009, Systematic biology.