Practical Aspects of Phylogenetic Network Analysis Using PhyloNet

Phylogenetic networks extend trees to enable simultaneous modeling of both vertical and horizontal evolutionary processes. PhyloNet is a software package that has been under constant development for over 10 years and includes a wide array of functionalities for inferring and analyzing phylogenetic networks. These functionalities differ in terms of the input data they require, the criteria and models they employ, and the types of information they allow to infer about the networks beyond their topologies. Furthermore, PhyloNet includes functionalities for simulating synthetic data on phylogenetic networks, quantifying the topological differences between phylogenetic networks, and evaluating evolutionary hypotheses given in the form of phylogenetic networks. In this paper, we use a simulated data set to illustrate the use of several of PhyloNet’s functionalities and make recommendations on how to analyze data sets and interpret the results when using these functionalities. All inference methods that we illustrate are incomplete lineage sorting (ILS) aware; that is, they account for the potential of ILS in the data while inferring the phylogenetic network. While the models do not include gene duplication and loss, we discuss how the methods can be used to analyze data in the presence of polyploidy. The concept of species is irrelevant for the computational analyses enabled by PhyloNet in that species-individuals mappings are user-defined. Consequently, none of the functionalities in PhyloNet deals with the task of species delimitation. In this sense, the data being analyzed could come from different individuals within a single species, in which case population structure along with potential gene flow is inferred (assuming the data has sufficient signal), or from different individuals sampled from different species, in which case the species phylogeny is being inferred.

[1]  Luay Nakhleh,et al.  Supplementary Information : Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017 .

[2]  Michael DeGiorgio,et al.  Robustness to divergence time underestimation when inferring species trees from estimated gene trees. , 2014, Systematic biology.

[3]  L. Nakhleh Evolutionary Phylogenetic Networks: Models and Issues , 2010 .

[4]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[5]  Vincent Moulton,et al.  Inferring polyploid phylogenies from multiply-labeled gene trees , 2009, BMC Evolutionary Biology.

[6]  Jiafan Zhu,et al.  Inference of species phylogenies from bi-allelic markers using pseudo-likelihood , 2018, bioRxiv.

[7]  Gabriel Cardona,et al.  Comparison of Galled Trees , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Katharina T. Huber,et al.  Metrics on Multilabeled Trees: Interrelationships and Diameter Bounds , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[10]  Bengt Oxelman,et al.  Statistical inference of allopolyploid species networks in the presence of incomplete lineage sorting. , 2012, Systematic biology.

[11]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[12]  Gabriel Cardona,et al.  On Nakhleh's Metric for Reduced Phylogenetic Networks , 2009, TCBB.

[13]  J. Degnan,et al.  Displayed Trees Do Not Determine Distinguishability Under the Network Multispecies Coalescent. , 2016, Systematic biology.

[14]  Luay Nakhleh,et al.  Inferring Phylogenetic Networks Using PhyloNet , 2017, bioRxiv.

[15]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[16]  Luay Nakhleh,et al.  Species Tree Inference by Minimizing Deep Coalescences , 2009, PLoS Comput. Biol..

[17]  J. Mallet Hybridization as an invasion of the genome. , 2005, Trends in ecology & evolution.

[18]  Gabriel Cardona,et al.  Two Results on Distances for Phylogenetic Networks , 2010, IWPACBB.

[19]  Juan Wang,et al.  A review of metrics measuring dissimilarity for rooted phylogenetic networks , 2019, Briefings Bioinform..

[20]  Luay Nakhleh,et al.  Empirical Performance of Tree-based Inference of Phylogenetic Networks , 2019, bioRxiv.

[21]  Matthew W. Hahn,et al.  Gene-tree reconciliation with MUL-trees to resolve polyploidy events , 2016, bioRxiv.

[22]  Christoph Oberprieler,et al.  A permutation approach for inferring species networks from gene trees in polyploid complexes by minimising deep coalescences , 2017 .

[23]  Yun Yu,et al.  Bayesian inference of phylogenetic networks from bi-allelic genetic markers , 2017, bioRxiv.

[24]  Joseph K. Pickrell,et al.  Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data , 2012, PLoS genetics.

[25]  Claudia R. Solís-Lemus,et al.  Inconsistency of Species Tree Methods under Gene Flow. , 2016, Systematic biology.

[26]  Céline Scornavacca,et al.  Reconstructible Phylogenetic Networks: Do Not Distinguish the Indistinguishable , 2015, PLoS Comput. Biol..

[27]  W. Maddison Gene Trees in Species Trees , 1997 .

[28]  Luay Nakhleh,et al.  Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017, bioRxiv.

[29]  Gabriel Cardona,et al.  The Comparison of Tree-Sibling Time Consistent Phylogenetic Networks Is Graph Isomorphism-Complete , 2014, TheScientificWorldJournal.

[30]  Yun Yu,et al.  Fast algorithms and heuristics for phylogenomics under ILS and hybridization , 2013, BMC Bioinformatics.

[31]  J. McInerney,et al.  The public goods hypothesis for the evolution of life on Earth , 2011, Biology Direct.

[32]  L. Nakhleh,et al.  A Metric on the Space of Reduced Phylogenetic Networks , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  Xinhao Liu,et al.  A divide-and-conquer method for scalable phylogenetic network inference from multilocus data , 2019, bioRxiv.

[34]  Katharina T. Huber,et al.  Folding and unfolding phylogenetic trees and networks , 2015, Journal of mathematical biology.

[35]  P. Smouse,et al.  genalex 6: genetic analysis in Excel. Population genetic software for teaching and research , 2006 .

[36]  Huw A. Ogilvie,et al.  Advances in Computational Methods for Phylogenetic Networks in the Presence of Hybridization , 2018, Bioinformatics and Phylogenetics.

[37]  Chao Zhang,et al.  ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees , 2018, BMC Bioinformatics.

[38]  Yun Yu,et al.  In the light of deep coalescence: revisiting trees within networks , 2016, BMC Bioinformatics.

[39]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[40]  L. Nakhleh,et al.  Integrated likelihood for phylogenomics under a no-common-mechanism model , 2018, BMC Genomics.

[41]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[42]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[43]  Noah A. Rosenberg,et al.  Consistency Properties of Species Tree Inference by Minimizing Deep Coalescences , 2011, J. Comput. Biol..

[44]  Yun Yu,et al.  A maximum pseudo-likelihood approach for phylogenetic networks , 2015, BMC Genomics.

[45]  Huw A. Ogilvie,et al.  Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods , 2015, Systematic biology.

[46]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[47]  Joseph K. Pickrell,et al.  Toward a new history and geography of human genes informed by ancient DNA. , 2014, Trends in genetics : TIG.

[48]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[49]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[50]  Kevin J. Liu,et al.  Maximum likelihood inference of reticulate evolutionary histories , 2014, Proceedings of the National Academy of Sciences.

[51]  Alberto Policriti,et al.  GAM-NGS: genomic assemblies merger for next generation sequencing , 2013, BMC Bioinformatics.

[52]  Nava Levit-Binnun,et al.  A quantitative physical model of the TMS-induced discharge artifacts in EEG , 2018, PLoS Comput. Biol..

[53]  Luay Nakhleh,et al.  Parsimonious inference of hybridization in the presence of incomplete lineage sorting. , 2013, Systematic biology.

[54]  Bengt Oxelman,et al.  Phylogenetics of Allopolyploids , 2017 .