Displayed Trees Do Not Determine Distinguishability Under the Network Multispecies Coalescent.

Recent work in estimating species relationships from gene trees has included inferring networks assuming that past hybridization has occurred between species. Probabilistic models using the multispecies coalescent can be used in this framework for likelihood-based inference of both network topologies and parameters, including branch lengths and hybridization parameters. A difficulty for such methods is that it is not always clear whether, or to what extent, networks are identifiable-that is whether there could be two distinct networks that lead to the same distribution of gene trees. For cases in which incomplete lineage sorting occurs in addition to hybridization, we demonstrate a new representation of the species network likelihood that expresses the probability distribution of the gene tree topologies as a linear combination of gene tree distributions given a set of species trees. This representation makes it clear that in some cases in which two distinct networks give the same distribution of gene trees when sampling one allele per species, the two networks can be distinguished theoretically when multiple individuals are sampled per species. This result means that network identifiability is not only a function of the trees displayed by the networks but also depends on allele sampling within species. We additionally give an example in which two networks that display exactly the same trees can be distinguished from their gene trees even when there is only one lineage sampled per species. [gene tree, hybridization, identifiability, maximum likelihood, species tree, phylogeny.].

[1]  D. Huson,et al.  A Survey of Combinatorial Methods for Phylogenetic Networks , 2010, Genome biology and evolution.

[2]  Daniel H. Huson,et al.  Fast computation of minimum hybridization networks , 2012, Bioinform..

[3]  Michael DeGiorgio,et al.  Robustness to divergence time underestimation when inferring species trees from estimated gene trees. , 2014, Systematic biology.

[4]  M. Nei,et al.  Gene genealogy and variance of interpopulational nucleotide differences. , 1985, Genetics.

[5]  Montgomery Slatkin,et al.  Subdivision in an ancestral species creates asymmetry in gene trees. , 2008, Molecular biology and evolution.

[6]  Yun Yu,et al.  A maximum pseudo-likelihood approach for phylogenetic networks , 2015, BMC Genomics.

[7]  Luay Nakhleh,et al.  Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. , 2011, Systematic biology.

[8]  Noah A Rosenberg,et al.  The probability of topological concordance of gene trees and species trees. , 2002, Theoretical population biology.

[9]  Luay Nakhleh,et al.  Inference of reticulate evolutionary histories by maximum likelihood: the performance of information criteria , 2012, BMC Bioinformatics.

[10]  L. Kubatko Identifying hybridization events in the presence of coalescence via model selection. , 2009, Systematic biology.

[11]  Teemu Roos,et al.  Likelihood-Based Inference of Phylogenetic Networks from Sequence Data by PhyloDAG , 2015, AlCoB.

[12]  S. Tavaré,et al.  On the genealogy of nested subsamples from a haploid population , 1984, Advances in Applied Probability.

[13]  Qixin He,et al.  Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. , 2010, Systematic biology.

[14]  Céline Scornavacca,et al.  Reconstructible Phylogenetic Networks: Do Not Distinguish the Indistinguishable , 2015, PLoS Comput. Biol..

[15]  J. Degnan,et al.  Hybrid-Lambda: simulation of multiple merger and Kingman gene genealogies in species networks and species trees , 2015, bioRxiv.

[16]  H. Comes,et al.  The effect of Quaternary climatic changes on plant distribution and evolution , 1998 .

[17]  Mike Steel Root location in random trees: a polarity property of all sampling consistent phylogenetic models except one. , 2012, Molecular phylogenetics and evolution.

[18]  Steven Kelk,et al.  Networks: expanding evolutionary thinking. , 2013, Trends in genetics : TIG.

[19]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[20]  Laura Kubatko,et al.  Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites. , 2014, Journal of theoretical biology.

[21]  M. Nei Molecular Evolutionary Genetics , 1987 .

[22]  Leo van Iersel,et al.  A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees , 2013, BMC Bioinformatics.

[23]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[24]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[25]  Colin N. Dewey,et al.  Fine-Scale Phylogenetic Discordance across the House Mouse Genome , 2009, PLoS genetics.

[26]  Daniel H. Huson,et al.  Phylogenetic Networks: Contents , 2010 .

[27]  Daniel H. Huson,et al.  Phylogenetic Networks: Introduction to phylogenetic networks , 2010 .

[28]  Yufeng Wu,et al.  COALESCENT‐BASED SPECIES TREE INFERENCE FROM GENE TREE TOPOLOGIES UNDER INCOMPLETE LINEAGE SORTING BY MAXIMUM LIKELIHOOD , 2012, Evolution; international journal of organic evolution.

[29]  R. Abbott,et al.  Homoploid hybrid speciation in action , 2010 .

[30]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[31]  Claudia R. Solís-Lemus,et al.  Inferring Phylogenetic Networks with Maximum Pseudolikelihood under Incomplete Lineage Sorting , 2015, PLoS genetics.

[32]  Bengt Oxelman,et al.  Statistical inference of allopolyploid species networks in the presence of incomplete lineage sorting. , 2012, Systematic biology.

[33]  Sen Song,et al.  Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model , 2012, Proceedings of the National Academy of Sciences.

[34]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[35]  J. Degnan,et al.  Fast and consistent estimation of species trees using supermatrix rooted triples. , 2010, Molecular biology and evolution.

[36]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[37]  Steven Kelk,et al.  Phylogenetic Networks: Concepts, Algorithms and Applications , 2012 .

[38]  Seth Sullivant,et al.  When do phylogenetic mixture models mimic other phylogenetic models? , 2012, Systematic biology.

[39]  Gabriel Cardona,et al.  Metrics for Phylogenetic Networks I: Generalizations of the Robinson-Foulds Metric , 2009, IEEE ACM Trans. Comput. Biol. Bioinform..

[40]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[41]  Laura Salter Kubatko,et al.  Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. , 2009, Theoretical population biology.

[42]  Jürgen Cox,et al.  1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data , 2012, BMC Bioinformatics.

[43]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[44]  Bengt Oxelman,et al.  From Gene Trees to a Dated Allopolyploid Network: Insights from the Angiosperm Genus Viola (Violaceae) , 2014, Systematic biology.

[45]  L. Nakhleh,et al.  Computational approaches to species phylogeny inference and gene tree reconciliation. , 2013, Trends in ecology & evolution.

[46]  Zhi-Zhong Chen,et al.  HybridNET: a tool for constructing hybridization networks , 2010, Bioinform..

[47]  Matthias Platzer,et al.  Mapping human genetic ancestry. , 2007, Molecular biology and evolution.

[48]  N. Rosenberg,et al.  Coalescence-Time Distributions in a Serial Founder Model of Human Evolutionary History , 2011, Genetics.

[49]  V Moulton,et al.  Likelihood analysis of phylogenetic networks using directed graphical models. , 2000, Molecular biology and evolution.

[50]  Sagi Snir,et al.  Maximum likelihood of phylogenetic networks , 2006, Bioinform..

[51]  Kevin J. Liu,et al.  Maximum likelihood inference of reticulate evolutionary histories , 2014, Proceedings of the National Academy of Sciences.

[52]  John A Rhodes,et al.  Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent , 2009, Journal of mathematical biology.

[53]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[54]  Leo van Iersel,et al.  A quadratic kernel for computing the hybridization number of multiple trees , 2012, Inf. Process. Lett..

[55]  Vincent Moulton,et al.  Using supernetworks to distinguish hybridization from lineage-sorting , 2008, BMC Evolutionary Biology.

[56]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[57]  S. Tavaré,et al.  ON THE GENEALOGY OF NESTED SUBSAMPLES , 1984 .

[58]  D. Greig,et al.  Prezygotic reproductive isolation between Saccharomyces cerevisiae and Saccharomyces paradoxus , 2008, BMC Evolutionary Biology.

[59]  Seth Sullivant,et al.  Identifiability of Large Phylogenetic Mixture Models , 2010, Bulletin of mathematical biology.

[60]  John A Rhodes,et al.  Determining species tree topologies from clade probabilities under the coalescent. , 2011, Journal of theoretical biology.

[61]  Charles Semple,et al.  Hybrids in real time. , 2006, Systematic biology.

[62]  M. Bordewich,et al.  Computing the Hybridization Number of Two Phylogenetic Trees Is Fixed-Parameter Tractable , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.