Reconciliation and local gene tree rearrangement can be of mutual profit

BackgroundReconciliation methods compare gene trees and species trees to recover evolutionary events such as duplications, transfers and losses explaining the history and composition of genomes. It is well-known that gene trees inferred from molecular sequences can be partly erroneous due to incorrect sequence alignments as well as phylogenetic reconstruction artifacts such as long branch attraction. In practice, this leads reconciliation methods to overestimate the number of evolutionary events. Several methods have been proposed to circumvent this problem, by collapsing the unsupported edges and then resolving the obtained multifurcating nodes, or by directly rearranging the binary gene trees. Yet these methods have been defined for models of evolution accounting only for duplications and losses, i.e. can not be applied to handle prokaryotic gene families.ResultsWe propose a reconciliation method accounting for gene duplications, losses and horizontal transfers, that specifically takes into account the uncertainties in gene trees by rearranging their weakly supported edges. Rearrangements are performed on edges having a low confidence value, and are accepted whenever they improve the reconciliation cost. We prove useful properties on the dynamic programming matrix used to compute reconciliations, which allows to speed-up the tree space exploration when rearrangements are generated by Nearest Neighbor Interchanges (NNI) edit operations. Experiments on synthetic data show that gene trees modified by such NNI rearrangements are closer to the correct simulated trees and lead to better event predictions on average. Experiments on real data demonstrate that the proposed method leads to a decrease in the reconciliation cost and the number of inferred events. Finally on a dataset of 30 k gene families, this reconciliation method shows a ranking of prokaryotic phyla by transfer rates identical to that proposed by a different approach dedicated to transfer detection [BMCBIOINF 11:324, 2010, PNAS 109(13):4962–4967, 2012].ConclusionsProkaryotic gene trees can now be reconciled with their species phylogeny while accounting for the uncertainty of the gene tree. More accurate and more precise reconciliations are obtained with respect to previous parsimony algorithms not accounting for such uncertainties [LNCS 6398:93–108, 2010, BIOINF 28(12): i283–i291, 2012].A software implementing the method is freely available at http://www.atgc-montpellier.fr/Mowgli/.

[1]  Ali Tofigh,et al.  Using Trees to Capture Reticulate Evolution : Lateral Gene Transfers and Cancer Progression , 2009 .

[2]  Ran Libeskind-Hadas,et al.  Jane: a new tool for the cophylogeny reconstruction problem , 2010, Algorithms for Molecular Biology.

[3]  V. Daubin,et al.  Modeling gene family evolution and reconciling phylogenetic discord. , 2012, Methods in molecular biology.

[4]  Krister M. Swenson,et al.  An Optimal Reconciliation Algorithm for Gene Trees with Polytomies , 2012, WABI.

[5]  Vincent Berry,et al.  Models, algorithms and programs for phylogeny reconciliation , 2011, Briefings Bioinform..

[6]  Dannie Durand,et al.  A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction , 2005, RECOMB.

[7]  R. Page Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. , 2000, Molecular phylogenetics and evolution.

[8]  Michael T. Hallett,et al.  Simultaneous Identification of Duplications and Lateral Gene Transfers , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Bengt Sennblad,et al.  The gene evolution model and computing its associated probabilities , 2009, JACM.

[10]  Michael J. Sanderson,et al.  R8s: Inferring Absolute Rates of Molecular Evolution, Divergence times in the Absence of a Molecular Clock , 2003, Bioinform..

[11]  Dayhoff Mo,et al.  The origin and evolution of protein superfamilies. , 1976 .

[12]  Matthew J. Betts,et al.  Optimal Gene Trees from Sequences and Species Trees Using a Soft Interpretation of Parsimony , 2006, Journal of Molecular Evolution.

[13]  Lawrence A. David,et al.  Rapid evolutionary innovation during an Archaean genetic expansion , 2011, Nature.

[14]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[15]  Michael T. Hallett,et al.  Simultaneous identification of duplications and lateral transfers , 2004, RECOMB.

[16]  Louxin Zhang,et al.  Reconciliation of Gene and Species Trees With Polytomies , 2012, 1201.3995.

[17]  Ali Tofigh,et al.  Using Trees to Capture Reticulate Evolution , 2009 .

[18]  M. O. Dayhoff,et al.  The origin and evolution of protein superfamilies. , 1976, Federation proceedings.

[19]  Oliver Eulenstein,et al.  Algorithms: simultaneous error-correction and rooting for gene tree reconciliation and the gene duplication problem , 2012, BMC Bioinformatics.

[20]  Dannie Durand,et al.  Reconciliation with non-binary species trees. , 2008, Journal of computational biology : a journal of computational molecular cell biology.

[21]  Matthew W. Hahn,et al.  Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution , 2007, Genome Biology.

[22]  Oliver Eulenstein,et al.  Reconciling Gene Trees with Apparent Polytomies , 2006, COCOON.

[23]  Tandy J. Warnow,et al.  Reconstructing reticulate evolution in species: theory and practice , 2004, RECOMB.

[24]  Kousha Etessami,et al.  Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations , 2005, JACM.

[25]  Vincent Berry,et al.  An Efficient Algorithm for Gene/Species Trees Parsimonious Reconciliation with Losses, Duplications and Transfers , 2010, RECOMB-CG.

[26]  D. Kendall On the Generalized "Birth-and-Death" Process , 1948 .

[27]  Sophie S Abby,et al.  Lateral gene transfer as a support for the tree of life , 2012, Proceedings of the National Academy of Sciences.

[28]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[29]  Manolo Gouy,et al.  Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests , 2010, BMC Bioinformatics.

[30]  Sophie S Abby,et al.  Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations , 2012, Proceedings of the National Academy of Sciences.

[31]  Manolis Kellis,et al.  Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss , 2012, Bioinform..

[32]  B Vernot,et al.  Reconciliation with Non-Binary Species Trees , 2007, J. Comput. Biol..

[33]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[34]  Ran Libeskind-Hadas,et al.  On the Computational Complexity of the Reticulate Cophylogeny Reconstruction Problem , 2009, J. Comput. Biol..

[35]  Oliver Eulenstein,et al.  Algorithms for Rapid Error Correction for the Gene Duplication Problem , 2011, ISBRA.

[36]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[37]  Guy Perrière,et al.  Databases of homologous gene families for comparative genomics , 2009, BMC Bioinformatics.

[38]  Pawel Górecki,et al.  Reconciliation problems for duplication, loss and horizontal gene transfer , 2004, RECOMB.

[39]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[40]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[41]  N. Galtier A model of horizontal gene transfer and the bacterial phylogeny problem. , 2007, Systematic biology.

[42]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[43]  Ran Libeskind-Hadas,et al.  The Cophylogeny Reconstruction Problem Is NP-Complete , 2011, J. Comput. Biol..

[44]  Dannie Durand,et al.  Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees , 2012, Bioinform..