Species Trees are Recoverable from Unrooted Gene Tree Topologies Under a Constant Rate of Horizontal Gene Transfer

Reconstructing the tree of life from molecular sequences is a fundamental problem in computational biology. Modern data sets often contain a large number of genes, which can complicate the reconstruction problem due to the fact that different genes may undergo different evolutionary histories. This is the case in particular in the presence of horizontal genetic transfer (HGT), where a gene is inherited from a distant species rather than an immediate ancestor. Such an event produces a gene tree which is distinct from, but related to, the species phylogeny. In previous work, a natural stochastic models of HGT was introduced and studied. It was shown, both in simulation and theoretical studies, that a species phylogeny can be reconstructed from gene trees despite surprisingly high rates of HGT under this model. Rigorous lower and upper bounds on this achievable rate were also obtained, but a large gap remained. Here we close this gap, up to a constant. Specifically we show that a species phylogeny can be reconstructed correctly from gene trees even when, on each gene, each edge of the species tree has a constant probability of being the location of an HGT event. Our new reconstruction algorithm, which relies only on unrooted gene tree topologies, builds the tree recursively from the leaves and runs in polynomial time. We also provide a matching bound in the negative direction (up to a constant) and extend our results to some cases where gene trees are not perfectly known.

[1]  Elchanan Mossel,et al.  On the Inference of Large Phylogenies with Long Branches: How Long Is Too Long? , 2010, Bulletin of mathematical biology.

[2]  W. Doolittle,et al.  Do orthologous gene phylogenies really support tree-thinking? , 2005, BMC Evolutionary Biology.

[3]  Constantinos Daskalakis,et al.  Alignment-Free Phylogenetic Reconstruction: Sample Complexity via a Branching Process Analysis , 2011, ArXiv.

[4]  Elchanan Mossel,et al.  Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep , 2011, SIAM J. Discret. Math..

[5]  Sagi Snir,et al.  Maximum likelihood of phylogenetic networks , 2006, Bioinform..

[6]  W. Doolittle,et al.  Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. , 2006, Genome research.

[7]  Simone Linz,et al.  Identifying a species tree subject to random lateral gene transfer. , 2012, Journal of theoretical biology.

[8]  N. Galtier,et al.  Dealing with incongruence in phylogenomic analyses , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[9]  Constantinos Daskalakis,et al.  Species Trees from Gene Trees Despite a High Rate of Lateral Genetic Transfer: A Tight Bound (Extended Abstract) , 2015, SODA.

[10]  Paul W. Goldberg,et al.  Evolutionary Trees Can be Learned in Polynomial Time in the Two-State General Markov Model , 2001, SIAM J. Comput..

[11]  T. Lindvall Lectures on the Coupling Method , 1992 .

[12]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[13]  Sagi Snir,et al.  Parsimony Score of Phylogenetic Networks: Hardness Results and a Linear-Time Heuristic , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Elchanan Mossel Phase transitions in phylogeny , 2003, Transactions of the American Mathematical Society.

[15]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[16]  Daniel L. Hartl,et al.  Genetics: Principles and Analysis , 1997 .

[17]  M. Suchard Stochastic Models for Horizontal Gene Transfer , 2005, Genetics.

[18]  Elchanan Mossel Reconstruction on Trees: Beating the Second Eigenvalue , 2001 .

[19]  Sagi Snir,et al.  Recovering the Tree-Like Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis , 2012, RECOMB.

[20]  C R Woese,et al.  The phylogeny of prokaryotes. , 1980, Science.

[21]  W. Maddison Gene Trees in Species Trees , 1997 .

[22]  Elchanan Mossel,et al.  Evolutionary trees and the Ising model on the Bethe lattice: a proof of Steel’s conjecture , 2005, ArXiv.

[23]  László A. Székely,et al.  Inverting Random Functions II: Explicit Bounds for Discrete Maximum Likelihood Estimation, with Applications , 2002, SIAM J. Discret. Math..

[24]  Junhyong Kim,et al.  The Cobweb of Life Revealed by Genome-Scale Estimates of Horizontal Gene Transfer , 2005, PLoS biology.

[25]  Y. Peres Probability on Trees: An Introductory Climb , 1999 .

[26]  Elchanan Mossel,et al.  On the Impossibility of Reconstructing Ancestral Data and Phylogenies , 2003, J. Comput. Biol..

[27]  Li Zhang,et al.  On the complexity of distance-based evolutionary tree reconstruction , 2003, SODA '03.

[28]  L. Nakhleh,et al.  Computational approaches to species phylogeny inference and gene tree reconciliation. , 2013, Trends in ecology & evolution.

[29]  Elchanan Mossel Distorted Metrics on Trees and Phylogenetic Forests , 2007, TCBB.

[30]  F. Delsuc,et al.  Phylogenomics and the reconstruction of the tree of life , 2005, Nature Reviews Genetics.

[31]  Alexandr Andoni,et al.  Global Alignment of Molecular Sequences via Ancestral State Reconstruction , 2009, ICS.

[32]  A. von Haeseler,et al.  A likelihood framework to measure horizontal gene transfer. , 2007, Molecular biology and evolution.

[33]  Mike Steel,et al.  The standard lateral gene transfer model is statistically consistent for pectinate four-taxon trees. , 2013, Journal of theoretical biology.

[34]  Constantinos Daskalakis,et al.  Alignment-Free Phylogenetic Reconstruction , 2010, RECOMB.

[35]  S. Roch Toward Extracting All Phylogenetic Information from Matrices of Evolutionary Distances , 2010, Science.

[36]  N. Galtier A model of horizontal gene transfer and the bacterial phylogeny problem. , 2007, Systematic biology.

[37]  Junhyong Kim,et al.  A Tree Obscured By Vines: Horizontal Gene Transfer and the Median Tree Method of Estimating Species Phylogeny , 2000, Pacific Symposium on Biocomputing.

[38]  Eric Bapteste,et al.  INAUGURAL ARTICLE by a Recently Elected Academy Member:Pattern pluralism and the Tree of Life hypothesis , 2007 .

[39]  Elchanan Mossel,et al.  Optimal phylogenetic reconstruction , 2005, STOC '06.