A divide-and-conquer method for scalable phylogenetic network inference from multilocus data

Abstract Motivation Reticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting. However, these methods can only handle a small number of loci from a handful of genomes. Results In this article, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological datasets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference. Availability and implementation We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet). Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Jiafan Zhu,et al.  Inference of species phylogenies from bi-allelic markers using pseudo-likelihood , 2018, bioRxiv.

[2]  Leo van Iersel,et al.  Reconstructing Phylogenetic Level-1 Networks from Nondense Binet and Trinet Sets , 2014, Algorithmica.

[3]  Kevin J. Liu,et al.  Maximum likelihood inference of reticulate evolutionary histories , 2014, Proceedings of the National Academy of Sciences.

[4]  Tanja Stadler,et al.  Bayesian Inference of Species Networks from Multilocus Sequence Data , 2017 .

[5]  Luay Nakhleh,et al.  Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017, bioRxiv.

[6]  Yun Yu,et al.  A maximum pseudo-likelihood approach for phylogenetic networks , 2015, BMC Genomics.

[7]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[8]  Iveta Simera,et al.  Ten simple rules for measuring the impact of workshops , 2018, PLoS computational biology.

[9]  L. Nakhleh,et al.  A Metric on the Space of Reduced Phylogenetic Networks , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Luay Nakhleh,et al.  Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent , 2016, PLoS genetics.

[11]  Jiafan Zhu,et al.  Inferring Phylogenetic Networks Using PhyloNet , 2017 .

[12]  Luay Nakhleh,et al.  Supplementary Information : Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017 .

[13]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[14]  Yun Yu,et al.  In the light of deep coalescence: revisiting trees within networks , 2016, BMC Bioinformatics.

[15]  Craig Moritz,et al.  Phylogenomics of a rapid radiation: the Australian rainbow skinks , 2018, BMC Evolutionary Biology.

[16]  Yun Yu,et al.  Bayesian inference of phylogenetic networks from bi-allelic genetic markers , 2017, bioRxiv.

[17]  Huw A. Ogilvie,et al.  Advances in Computational Methods for Phylogenetic Networks in the Presence of Hybridization , 2018, Bioinformatics and Phylogenetics.

[18]  Kevin J. Liu,et al.  FastNet: Fast and Accurate Statistical Inference of Phylogenetic Networks Using Large-Scale Genomic Sequence Data , 2018, RECOMB-CG.

[19]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[20]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[21]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.