The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection

Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa.

[1]  Luay Nakhleh,et al.  Species Tree Inference by Minimizing Deep Coalescences , 2009, PLoS Comput. Biol..

[2]  Daniel H. Huson,et al.  Phylogenetic Networks - Concepts, Algorithms and Applications , 2011 .

[3]  M. Nordborg,et al.  Coalescent Theory , 2019, Handbook of Statistical Genomics.

[4]  John A Rhodes,et al.  Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent , 2009, Journal of mathematical biology.

[5]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[6]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[7]  D. Pearl,et al.  Estimating species phylogenies using coalescence times among sequences. , 2009, Systematic biology.

[8]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[9]  Bruce Rannala,et al.  The accuracy of species tree estimation under simulation: a comparison of methods. , 2011, Systematic biology.

[10]  Luay Nakhleh,et al.  RECOMP: A Parsimony-Based Method for Detecting Recombination , 2005, APBC.

[11]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[12]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[13]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[14]  Qixin He,et al.  Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. , 2010, Systematic biology.

[15]  M. Nei Molecular Evolutionary Genetics , 1987 .

[16]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[17]  B. Larget,et al.  Bayesian estimation of concordance among gene trees. , 2006, Molecular biology and evolution.

[18]  Vincent Moulton,et al.  Using supernetworks to distinguish hybridization from lineage-sorting , 2008, BMC Evolutionary Biology.

[19]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[20]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[21]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[22]  Bryan C. Carstens,et al.  Delimiting species without monophyletic gene trees. , 2007, Systematic biology.

[23]  H. Akaike A new look at the statistical model identification , 1974 .

[24]  Daniel H. Huson,et al.  Phylogenetic Networks: Contents , 2010 .

[25]  Cécile Ané,et al.  Detecting Phylogenetic Breakpoints and Discordance from Genome-Wide Alignments for Species Tree Reconstruction , 2011, Genome biology and evolution.

[26]  L. Nakhleh Evolutionary Phylogenetic Networks: Models and Issues , 2010 .

[27]  Bryan C Carstens,et al.  Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. , 2007, Systematic biology.

[28]  Sheldon M. Ross Introduction to Probability Models. , 1995 .

[29]  Vincent Moulton,et al.  Reconstructing the evolutionary history of polyploids from multilabeled trees. , 2006, Molecular biology and evolution.

[30]  Marc A Suchard,et al.  Unifying vertical and nonvertical evolution: a stochastic ARG-based framework. , 2010, Systematic biology.

[31]  Noah A. Rosenberg,et al.  Counting Coalescent Histories , 2007, J. Comput. Biol..

[32]  Laura Salter Kubatko,et al.  Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. , 2009, Theoretical population biology.

[33]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[34]  J. Mallet Hybridization as an invasion of the genome. , 2005, Trends in ecology & evolution.

[35]  John A Rhodes,et al.  Determining species tree topologies from clade probabilities under the coalescent. , 2011, Journal of theoretical biology.

[36]  Montgomery Slatkin,et al.  Linkage disequilibrium — understanding the evolutionary past and mapping the medical future , 2008, Nature Reviews Genetics.

[37]  Hayley C. Lanier,et al.  Is recombination a problem for species-tree analyses? , 2012, Systematic biology.

[38]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[39]  Luay Nakhleh,et al.  Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. , 2011, Systematic biology.

[40]  Noah A Rosenberg,et al.  The probability of topological concordance of gene trees and species trees. , 2002, Theoretical population biology.

[41]  David Bryant,et al.  Properties of consensus methods for inferring species trees from gene trees. , 2008, Systematic biology.

[42]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[43]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[44]  J. Degnan,et al.  Performance of Matrix Representation with Parsimony for Inferring Species from Gene Trees , 2011 .

[45]  Luay Nakhleh,et al.  Confounding Factors in HGT Detection: Statistical Error, Coalescent Effects, and Multiple Solutions , 2007, J. Comput. Biol..

[46]  J. Doyle,et al.  Gene Trees and Species Trees: Molecular Systematics as One-Character Taxonomy , 1992 .

[47]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[48]  D. Bryant,et al.  A Simple and Robust Statistical Test for Detecting the Presence of Recombination , 2006, Genetics.

[49]  J. Mallet Hybrid speciation , 2007, Nature.

[50]  L. Kubatko Identifying hybridization events in the presence of coalescence via model selection. , 2009, Systematic biology.

[51]  M. Arnold Natural Hybridization and Evolution , 1997 .

[52]  Loren H Rieseberg,et al.  Reconstructing patterns of reticulate evolution in plants. , 2004, American journal of botany.

[53]  K. Crandall,et al.  Evaluation of methods for detecting recombination from DNA sequences: Computer simulations , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Bernard M. E. Moret,et al.  Phylogenetic Inference , 2011, Encyclopedia of Parallel Computing.

[55]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[56]  Alan M. Moses,et al.  Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for Incomplete Lineage Sorting , 2006, PLoS genetics.

[57]  Yufeng Wu,et al.  COALESCENT‐BASED SPECIES TREE INFERENCE FROM GENE TREE TOPOLOGIES UNDER INCOMPLETE LINEAGE SORTING BY MAXIMUM LIKELIHOOD , 2012, Evolution; international journal of organic evolution.

[58]  Noah A. Rosenberg,et al.  Consistency Properties of Species Tree Inference by Minimizing Deep Coalescences , 2011, J. Comput. Biol..

[59]  J. Degnan,et al.  Fast and consistent estimation of species trees using supermatrix rooted triples. , 2010, Molecular biology and evolution.