Estimating Species Trees Using Multiple-Allele DNA Sequence Data

Abstract Several techniques, such as concatenation and consensus methods, are available for combining data from multiple loci to produce a single statement of phylogenetic relationships. However, when multiple alleles are sampled from individual species, it becomes more challenging to estimate relationships at the level of species, either because concatenation becomes inappropriate due to conflicts among individual gene trees, or because the species from which multiple alleles have been sampled may not form monophyletic groups in the estimated tree. We propose a Bayesian hierarchical model to reconstruct species trees from multipleallele, multilocus sequence data, building on a recently proposed method for estimating species trees from single allele multilocus data. A two-step Markov Chain Monte Carlo (MCMC) algorithm is adopted to estimate the posterior distribution of the species tree. The model is applied to estimate the posterior distribution of species trees for two multiple-allele datasets—yeast (Saccharomyces) and birds (Manacus—manakins). The estimates of the species trees using our method are consistent with those inferred from other methods and genetic markers, but in contrast to other species tree methods, it provides credible regions for the species tree. The Bayesian approach described here provides a powerful framework for statistical testing and integration of population genetics and phylogenetics.

[1]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[2]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[3]  A. Knight,et al.  Inferring species trees from gene trees: a phylogenetic analysis of the Elapidae (Serpentes) based on the amino acid sequences of venom proteins. , 1997, Molecular phylogenetics and evolution.

[4]  S. Edwards,et al.  Comparison of species tree methods for reconstructing the phylogeny of bearded manakins (Aves: Pipridae, Manacus) from multilocus sequence data. , 2008, Systematic biology.

[5]  Jody Hey,et al.  Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics , 2007, Proceedings of the National Academy of Sciences.

[6]  B. Rannala,et al.  The Bayesian revolution in genetics , 2004, Nature Reviews Genetics.

[7]  S. Holmberg,et al.  Transfer of chromosome III duringkar mediated cytoduction in yeast , 1980 .

[8]  J. Felsenstein Accuracy of coalescent likelihood estimates: do we need more sites, more sequences, or more loci? , 2006, Molecular biology and evolution.

[9]  Roderic D. M. Page,et al.  GeneTree: comparing gene and species phylogenies using reconciled trees , 1998, Bioinform..

[10]  W. Maddison Gene Trees in Species Trees , 1997 .

[11]  Junhyong Kim,et al.  Separate Versus Combined Analysis of Phylogenetic Evidence , 1995 .

[12]  Gianni Liti,et al.  Sequence Diversity, Reproductive Isolation and Species Concepts in Saccharomyces , 2006, Genetics.

[13]  D. Posada,et al.  Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. , 2004, Systematic biology.

[14]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[15]  James H. Degnan,et al.  GENE TREE DISTRIBUTIONS UNDER THE COALESCENT PROCESS , 2005, Evolution; international journal of organic evolution.

[16]  J. Bull,et al.  Partitioning and combining data in phylogenetic analysis , 1993 .

[17]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[18]  S. Åkesson,et al.  Conflicting patterns of mitochondrial and nuclear DNA diversity in Phylloscopus warblers , 2005, Molecular ecology.

[19]  Mark T. Holder,et al.  The Posterior and the Prior in Bayesian Phylogenetics , 2006 .

[20]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[21]  EVOLUTIONARY IMPLICATIONS OF DIVERGENT CLINES IN AN AVIAN (MANACUS: AVES) HYBRID ZONE , 2001, Evolution; international journal of organic evolution.

[22]  J. Kingman Origins of the coalescent. 1974-1982. , 2000, Genetics.

[23]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[24]  J. Wakeley,et al.  The variance of pairwise nucleotide differences in two populations with migration. , 1996, Theoretical population biology.

[25]  D. J. Funk,et al.  Species-Level Paraphyly and Polyphyly: Frequency, Causes, and Consequences, with Insights from Animal Mitochondrial DNA , 2003 .

[26]  J. Bull,et al.  Is character weighting a Panacea for the problem of data heterogeneity in phylogenetic analysis , 1994 .

[27]  Mark Holder,et al.  Model parameterization, prior distributions, and the general time-reversible model in Bayesian phylogenetics. , 2004, Systematic biology.

[28]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[29]  J. Huelsenbeck,et al.  Bayesian phylogenetic analysis of combined data. , 2004, Systematic biology.

[30]  Peter Beerli,et al.  Comparison of Bayesian and maximum-likelihood inference of population genetic parameters , 2006, Bioinform..

[31]  N. Shirley,et al.  Evidence for multiple interspecific hybridization in Saccharomyces sensu stricto species. , 2002, FEMS yeast research.

[32]  M. Braun,et al.  PHYLOGENETIC RELATIONSHIPS IN BEARDED MANAKINS (PIPRIDAE: MANACUS) INDICATE THAT MALE PLUMAGE COLOR IS A MISLEADING TAXONOMIC MARKER , 2001 .

[33]  A. von Haeseler,et al.  A likelihood framework to measure horizontal gene transfer. , 2007, Molecular biology and evolution.

[34]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[35]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[36]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[37]  A. Queiroz For Consensus (Sometimes) , 1993 .

[38]  Elchanan Mossel,et al.  Incomplete Lineage Sorting: Consistent Phylogeny Estimation from Multiple Loci , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  Bruce Rannala,et al.  Summarizing a posterior distribution of trees using agreement subtrees. , 2007, Systematic biology.

[40]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[41]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[42]  Keith A. Gardner,et al.  Hybrid zones and the genetic architecture of a barrier to gene flow between two sunflower species. , 1999, Genetics.

[43]  R M May,et al.  The reconstructed evolutionary process. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[44]  J. Wiens Combining data sets with different phylogenetic histories. , 1998, Systematic biology.

[45]  G. Serio,et al.  A new method for calculating evolutionary substitution rates , 2005, Journal of Molecular Evolution.

[46]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[47]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[48]  P. Sniegowski,et al.  Saccharomyces cerevisiae and Saccharomyces paradoxus coexist in a natural woodland site in North America and display different levels of reproductive isolation from European conspecifics. , 2002, FEMS yeast research.

[49]  David Posada,et al.  MODELTEST: testing the model of DNA substitution , 1998, Bioinform..

[50]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[51]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[52]  LIKELIHOOD ANALYSIS OF ONGOING GENE FLOW AND HISTORICAL ASSOCIATION , 2000, Evolution; international journal of organic evolution.

[53]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[54]  R. Page,et al.  From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. , 1997, Molecular phylogenetics and evolution.