Efficient error correction algorithms for gene tree reconciliation based on duplication, duplication and loss, and deep coalescence

BackgroundGene tree - species tree reconciliation problems infer the patterns and processes of gene evolution within a species tree. Gene tree parsimony approaches seek the evolutionary scenario that implies the fewest gene duplications, duplications and losses, or deep coalescence (incomplete lineage sorting) events needed to reconcile a gene tree and a species tree. While a gene tree parsimony approach can be informative about genome evolution and phylogenetics, error in gene trees can profoundly bias the results.ResultsWe introduce efficient algorithms that rapidly search local Subtree Prune and Regraft (SPR) or Tree Bisection and Reconnection (TBR) neighborhoods of a given gene tree to identify a topology that implies the fewest duplications, duplication and losses, or deep coalescence events. These algorithms improve on the current solutions by a factor of n for searching SPR neighborhoods and n2 for searching TBR neighborhoods, where n is the number of taxa in the given gene tree. They provide a fast error correction protocol for ameliorating the effects of gene tree error by allowing small rearrangements in the topology to improve the reconciliation cost. We also demonstrate a simple protocol to use the gene rearrangement algorithm to improve gene tree parsimony phylogenetic analyses.ConclusionsThe new gene tree rearrangement algorithms provide a fast method to address gene tree error. They do not make assumptions about the underlying processes of genome evolution, and they are amenable to analyses of large-scale genomic data sets. These algorithms are also easily incorporated into gene tree parsimony phylogenetic analyses, potentially producing more credible estimates of reconciliation cost.

[1]  Tandy J. Warnow,et al.  Algorithms for MDC-Based Multi-locus Phylogeny Inference , 2011, RECOMB.

[2]  B Vernot,et al.  Reconciliation with Non-Binary Species Trees , 2007, J. Comput. Biol..

[3]  Dannie Durand,et al.  NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees , 2000, J. Comput. Biol..

[4]  Oliver Eulenstein,et al.  Locating Large-Scale Gene Duplication Events through Reconciled Trees: Implications for Identifying Ancient Polyploidy Events in Plants , 2009, J. Comput. Biol..

[5]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[6]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[7]  Matthew W. Hahn,et al.  Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution , 2007, Genome Biology.

[8]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[9]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[10]  Louxin Zhang,et al.  On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies , 1997, J. Comput. Biol..

[11]  Oliver Eulenstein,et al.  Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. , 2011, Systematic biology.

[12]  Matthew J. Betts,et al.  Optimal Gene Trees from Sequences and Species Trees Using a Soft Interpretation of Parsimony , 2006, Journal of Molecular Evolution.

[13]  A. Knight,et al.  Inferring species trees from gene trees: a phylogenetic analysis of the Elapidae (Serpentes) based on the amino acid sequences of venom proteins. , 1997, Molecular phylogenetics and evolution.

[14]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[15]  Luay Nakhleh,et al.  Species Tree Inference by Minimizing Deep Coalescences , 2009, PLoS Comput. Biol..

[16]  Roderic D. M. Page,et al.  Going nuclear: gene family evolution and vertebrate phylogeny reconciled , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[17]  BMC Bioinformatics , 2005 .

[18]  Anne Bruneau,et al.  Measuring branch support in species trees obtained by gene tree parsimony. , 2009, Systematic biology.

[19]  Bengt Sennblad,et al.  Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution , 2004, RECOMB.

[20]  M. Sanderson,et al.  Inferring angiosperm phylogeny from EST data with widespread gene duplication , 2007, BMC Evolutionary Biology.

[21]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[22]  Dannie Durand,et al.  A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction , 2005, RECOMB.

[23]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[24]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[25]  Jerzy Tiuryn,et al.  Inferring phylogeny from whole genomes , 2007, Bioinform..

[26]  Manolis Kellis,et al.  A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction , 2010, Molecular biology and evolution.

[27]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[28]  L. Knowles,et al.  What is the danger of the anomaly zone for empirical phylogenetics? , 2009, Systematic biology.

[29]  Oliver Eulenstein,et al.  Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models , 2010, BMC Bioinformatics.

[30]  David Fernández-Baca,et al.  An ILP solution for the gene duplication problem , 2011, BMC Bioinformatics.

[31]  Oliver Eulenstein,et al.  DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony , 2008, Bioinform..