Unifying Gene Duplication, Loss, and Coalescence on Phylogenetic Networks

Statistical methods were recently introduced for inferring phylogenetic networks under the multispecies network coalescent, thus accounting for both reticulation and incomplete lineage sorting. Two evolutionary processes that are ubiquitous across all three domains of life, but are not accounted for by those methods, are gene duplication and loss (GDL). In this work, we devise a three-piece model—phylogenetic network, locus network, and gene tree—that unifies all the aforementioned processes into a single model of how genes evolve in the presence of ILS, GDL, and introgression within the branches of a phylogenetic network. To illustrate the power of this model, we develop an algorithm for estimating the parameters of a phylogenetic network topology under this unified model. The algorithm consists of a set of moves that allow for stochastic search through the parameter space. The challenges with developing such moves stem from the intricate dependencies among the three pieces of the model. We demonstrate the application of the model and the accuracy of the algorithm on simulated as well as biological data. Our work adds to the biologist’s toolbox of methods for phylogenomic inference by accounting for more complex evolutionary processes.

[1]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[2]  S. Jeffery Evolution of Protein Molecules , 1979 .

[3]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[4]  M. Nei,et al.  Gene genealogy and variance of interpopulational nucleotide differences. , 1985, Genetics.

[5]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[6]  W. Maddison Gene Trees in Species Trees , 1997 .

[7]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[8]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[9]  GENE TREE DISTRIBUTIONS UNDER THE COALESCENT PROCESS , 2005, Evolution; international journal of organic evolution.

[10]  J. Mallet Hybridization as an invasion of the genome. , 2005, Trends in ecology & evolution.

[11]  J. Mallet Hybrid speciation , 2007, Nature.

[12]  Kathleen Marchal,et al.  Evaluation of time profile reconstruction from complex two-color microarray designs , 2008, BMC Bioinformatics.

[13]  Patrik Nosil,et al.  Speciation with gene flow could be common , 2008, Molecular ecology.

[14]  T. J. Robinson,et al.  Hemiplasy: a new term in the lexicon of phylogenetics. , 2008, Systematic biology.

[15]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[16]  J. Lagergren,et al.  Simultaneous Bayesian gene tree reconstruction and reconciliation analysis , 2009, Proceedings of the National Academy of Sciences.

[17]  Bengt Sennblad,et al.  The gene evolution model and computing its associated probabilities , 2009, JACM.

[18]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[19]  L. Nakhleh,et al.  A Metric on the Space of Reduced Phylogenetic Networks , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[21]  Manolis Kellis,et al.  A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction , 2010, Molecular biology and evolution.

[22]  Manolis Kellis,et al.  Unified modeling of gene duplication, loss, and coalescence using a locus tree. , 2012, Genome research.

[23]  Yufeng Wu,et al.  COALESCENT‐BASED SPECIES TREE INFERENCE FROM GENE TREE TOPOLOGIES UNDER INCOMPLETE LINEAGE SORTING BY MAXIMUM LIKELIHOOD , 2012, Evolution; international journal of organic evolution.

[24]  Joel Sjöstrand,et al.  DLRS: gene tree evolution in light of a species tree , 2012, Bioinform..

[25]  Luay Nakhleh,et al.  Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent , 2016, PLoS genetics.

[26]  Luay Nakhleh,et al.  Reticulate evolutionary history and extensive introgression in mosquito species revealed by phylogenetic network analysis , 2016, Molecular ecology.

[27]  Luay Nakhleh,et al.  Supplementary Information : Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017 .

[28]  Yi-Chieh Wu,et al.  Reconciliation feasibility in the presence of gene duplication, loss, and coalescence with multiple individuals per species , 2017, BMC Bioinformatics.

[29]  Bo Zhang,et al.  Coestimation of Gene Trees and Reconciliations Under a Duplication-Loss-Coalescence Model , 2017, ISBRA.

[30]  Tanja Stadler,et al.  Bayesian Inference of Species Networks from Multilocus Sequence Data , 2017, bioRxiv.

[31]  Luay Nakhleh,et al.  Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017, bioRxiv.

[32]  Peng Du,et al.  Species Tree and Reconciliation Estimation under a Duplication-Loss-Coalescence Model , 2018, BCB.

[33]  Luay Nakhleh,et al.  Inferring Phylogenetic Networks Using PhyloNet , 2017, bioRxiv.

[34]  Peng Du,et al.  Unifying Gene Duplication, Loss, and Coalescence on Phylogenetic Networks , 2019, ISBRA.