Empirical Performance of Tree-based Inference of Phylogenetic Networks

Phylogenetic networks extend the phylogenetic tree structure and allow for modeling vertical and horizontal evolution in a single framework. Statistical inference of phylogenetic networks is prohibitive and currently limited to small networks. An approach that could significantly improve phylogenetic network space exploration is based on first inferring an evolutionary tree of the species under consideration, and then augmenting the tree into a network by adding a set of “horizontal” edges to better fit the data. In this paper, we study the performance of such an approach on networks generated under a birth-hybridization model and explore its feasibility as an alternative to approaches that search the phylogenetic network space directly (without relying on a fixed underlying tree). We find that the concatenation method does poorly at obtaining a “backbone” tree that could be augmented into the correct network, whereas the popular species tree inference method ASTRAL does significantly better at such a task. We then evaluated the tree-to-network augmentation phase under the minimizing deep coalescence and pseudo-likelihood criteria. We find that even though this is a much faster approach than the direct search of the network space, the accuracy is much poorer, even when the backbone tree is a good starting tree. Our results show that tree-based inference of phylogenetic networks could yield very poor results. As exploration of the network space directly in search of maximum likelihood estimates or a representative sample of the posterior is very expensive, significant improvements to the computational complexity of phylogenetic network inference are imperative if analyses of large data sets are to be performed. We show that a recently developed divide-and-conquer approach significantly outperforms tree-based inference in terms of accuracy, albeit still at a higher computational cost.

[1]  Alberto Policriti,et al.  GAM-NGS: genomic assemblies merger for next generation sequencing , 2013, BMC Bioinformatics.

[2]  Louxin Zhang On Tree-Based Phylogenetic Networks , 2016, J. Comput. Biol..

[3]  Huw A. Ogilvie,et al.  Advances in Computational Methods for Phylogenetic Networks in the Presence of Hybridization , 2018, Bioinformatics and Phylogenetics.

[4]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[5]  Tanja Stadler,et al.  Bayesian Inference of Species Networks from Multilocus Sequence Data , 2017, bioRxiv.

[6]  Luay Nakhleh,et al.  Supplementary Information : Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017 .

[7]  Chao Zhang,et al.  ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees , 2018, BMC Bioinformatics.

[8]  Luay Nakhleh,et al.  Parsimonious inference of hybridization in the presence of incomplete lineage sorting. , 2013, Systematic biology.

[9]  Luay Nakhleh,et al.  Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent , 2016, PLoS genetics.

[10]  Yun Yu,et al.  A maximum pseudo-likelihood approach for phylogenetic networks , 2015, BMC Genomics.

[11]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[12]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[13]  Sagi Snir,et al.  Recovering the Tree-Like Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis , 2012, RECOMB.

[14]  Gabriel Cardona,et al.  Two Results on Distances for Phylogenetic Networks , 2010, IWPACBB.

[15]  Jiafan Zhu,et al.  Inference of species phylogenies from bi-allelic markers using pseudo-likelihood , 2018, bioRxiv.

[16]  Craig Moritz,et al.  Phylogenomics of a rapid radiation: the Australian rainbow skinks , 2018, BMC Evolutionary Biology.

[17]  Herbert M. Sauro,et al.  Tellurium notebooks—An environment for reproducible dynamical modeling in systems biology , 2018, PLoS Comput. Biol..

[18]  Yun Yu,et al.  Bayesian inference of phylogenetic networks from bi-allelic genetic markers , 2017, bioRxiv.

[19]  Kevin J. Liu,et al.  Maximum likelihood inference of reticulate evolutionary histories , 2014, Proceedings of the National Academy of Sciences.

[20]  Luay Nakhleh,et al.  Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017, bioRxiv.

[21]  B. Rost,et al.  Better prediction of functional effects for sequence variants , 2015, BMC Genomics.

[22]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[23]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[24]  Tandy Warnow,et al.  Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer , 2015, bioRxiv.

[25]  Gabriel Cardona,et al.  On Nakhleh's Metric for Reduced Phylogenetic Networks , 2009, TCBB.

[26]  Claudia R. Solís-Lemus,et al.  Inconsistency of Species Tree Methods under Gene Flow. , 2016, Systematic biology.

[27]  Gabriel Cardona,et al.  On Nakhleh's Metric for Reduced Phylogenetic Networks , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Luay Nakhleh,et al.  Inferring Phylogenetic Networks Using PhyloNet , 2017, bioRxiv.

[29]  Yun Yu,et al.  Fast algorithms and heuristics for phylogenomics under ILS and hybridization , 2013, BMC Bioinformatics.

[30]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[31]  L. Nakhleh,et al.  A Metric on the Space of Reduced Phylogenetic Networks , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Mike A. Steel,et al.  Which Phylogenetic Networks are Merely Trees with Additional Arcs? , 2015, Systematic biology.

[33]  Xinhao Liu,et al.  A Divide-and-Conquer Method for Scalable Phylogenetic Network Inference from Multi-locus Data , 2019 .

[34]  Yun Yu,et al.  In the light of deep coalescence: revisiting trees within networks , 2016, BMC Bioinformatics.