Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models

BackgroundGenomic data provide a wealth of new information for phylogenetic analysis. Yet making use of this data requires phylogenetic methods that can efficiently analyze extremely large data sets and account for processes of gene evolution, such as gene duplication and loss, incomplete lineage sorting (deep coalescence), or horizontal gene transfer, that cause incongruence among gene trees. One such approach is gene tree parsimony, which, given a set of gene trees, seeks a species tree that requires the smallest number of evolutionary events to explain the incongruence of the gene trees. However, the only existing algorithms for gene tree parsimony under the duplication-loss or deep coalescence reconciliation cost are prohibitively slow for large datasets.ResultsWe describe novel algorithms for SPR and TBR based local search heuristics under the duplication-loss cost, and we show how they can be adapted for the deep coalescence cost. These algorithms improve upon the best existing algorithms for these problems by a factor of n, where n is the number of species in the collection of gene trees. We implemented our new SPR based local search algorithm for the duplication-loss cost and demonstrate the tremendous improvement in runtime and scalability it provides compared to existing implementations. We also evaluate the performance of our algorithm on three large-scale genomic data sets.ConclusionOur new algorithms enable, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication-loss and deep coalescence reconciliation costs. Thus, this work expands both the size of data sets and the range of evolutionary models that can be incorporated into genome-scale phylogenetic analyses.

[1]  Jerzy Tiuryn,et al.  DLS-trees: A model of evolutionary scenarios , 2006, Theor. Comput. Sci..

[2]  O. Eulenstein,et al.  An Ω(n^2/ log n) Speed-Up of TBR Heuristics for the Gene-Duplication Problem , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Roderic D. M. Page,et al.  Tangled Tales from Multiple Markers , 2004 .

[4]  J. Lagergren,et al.  Simultaneous Bayesian gene tree reconstruction and reconciliation analysis , 2009, Proceedings of the National Academy of Sciences.

[5]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[6]  David Fernández-Baca,et al.  Improved Heuristics for Minimum-Flip Supertree Construction , 2006, Evolutionary bioinformatics online.

[7]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[8]  Nadia El-Mabrouk,et al.  New Perspectives on Gene Family Evolution: Losses in Reconciliation and a Link with Supertrees , 2009, RECOMB.

[9]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[10]  P. Schmidt,et al.  Erratum: Adaptive Evolution of Metabolic Pathways in Drosophila (Molecular Biology and Evolution (2007) 24, (1347-1354)) , 2007 .

[11]  Luay Nakhleh,et al.  Species Tree Inference by Minimizing Deep Coalescences , 2009, PLoS Comput. Biol..

[12]  B. Larget,et al.  Bayesian estimation of concordance among gene trees. , 2006, Molecular biology and evolution.

[13]  J. G. Burleigh,et al.  Heuristics for the Gene-duplication Problem : A Θ ( n ) Speed-up for the Local Search , 2007 .

[14]  Nadia El-Mabrouk,et al.  Gene Family Evolution by Duplication, Speciation and Loss , 2022 .

[15]  O. Bininda-Emonds Phylogenetic Supertrees: Combining Information To Reveal The Tree Of Life , 2004 .

[16]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[17]  Oliver Eulenstein,et al.  The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI-Based Local Searches , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Roderic D. M. Page,et al.  GeneTree: comparing gene and species phylogenies using reconciled trees , 1998, Bioinform..

[19]  Ilya B. Muchnik,et al.  A Biologically Consistent Model for Comparing Molecular Phylogenies , 1995, J. Comput. Biol..

[20]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[21]  J. Kissinger,et al.  The Apicomplexan Whole-Genome Phylogeny: An Analysis of Incongruence among Gene Trees , 2008, Molecular biology and evolution.

[22]  Bengt Sennblad,et al.  Bayesian gene/species tree reconciliation and orthology analysis using MCMC , 2003, ISMB.

[23]  Martin Vingron,et al.  On the Equivalence of Two Tree Mapping Measures , 1998, Discret. Appl. Math..

[24]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[25]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[26]  Oliver Eulenstein,et al.  Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. , 2011, Systematic biology.

[27]  Michael T. Hallett,et al.  New algorithms for the duplication-loss model , 2000, RECOMB '00.

[28]  David Fernández-Baca,et al.  Improved Heuristics for Minimum-Flip Supertree Construction , 2006 .

[29]  David Fernández-Baca,et al.  Algorithms for efficient phylogenetic tree construction , 2009 .

[30]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[31]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[32]  Oliver Eulenstein,et al.  DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony , 2008, Bioinform..

[33]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[34]  Dannie Durand,et al.  A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction , 2005, RECOMB.

[35]  Dannie Durand,et al.  A hybrid micro-macroevolutionary approach to gene tree reconstruction. , 2006 .

[36]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[37]  Paola Bonizzoni,et al.  Reconciling a gene tree to a species tree under the duplication cost model , 2005, Theor. Comput. Sci..