Gene tree correction for reconciliation and species tree inference

BackgroundReconciliation is the commonly used method for inferring the evolutionary scenario for a gene family. It consists in “embedding” inferred gene trees into a known species tree, revealing the evolution of the gene family by duplications and losses. When a species tree is not known, a natural algorithmic problem is to infer a species tree from a set of gene trees, such that the corresponding reconciliation minimizes the number of duplications and/or losses. The main drawback of reconciliation is that the inferred evolutionary scenario is strongly dependent on the considered gene trees, as few misplaced leaves may lead to a completely different history, with significantly more duplications and losses.ResultsIn this paper, we take advantage of certain gene trees’ properties in order to preprocess them for reconciliation or species tree inference. We flag certain duplication vertices of a gene tree, the “non-apparent duplication” (NAD) vertices, as resulting from the misplacement of leaves. In the case of species tree inference, we develop a polynomial-time heuristic for removing the minimum number of species leading to a set of gene trees that exhibit no NAD vertices with respect to at least one species tree. In the case of reconciliation, we consider the optimization problem of removing the minimum number of leaves or species leading to a tree without any NAD vertex. We develop a polynomial-time algorithm that is exact for two special classes of gene trees, and show a good performance on simulated data sets in the general case.

[1]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[2]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[3]  Mikkel Thorup,et al.  On the Agreement of Many Trees , 1995, Inf. Process. Lett..

[4]  Claude Berge,et al.  Hypergraphs - combinatorics of finite sets , 1989, North-Holland mathematical library.

[5]  Jianzhi Zhang Evolution by gene duplication: an update , 2003 .

[6]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[7]  Nadia El-Mabrouk,et al.  New Perspectives on Gene Family Evolution: Losses in Reconciliation and a Link with Supertrees , 2009, RECOMB.

[8]  Dannie Durand,et al.  A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction , 2005, RECOMB.

[9]  Nadia El-Mabrouk,et al.  Gene Family Evolution by Duplication, Speciation and Loss , 2022 .

[10]  Matthew W. Hahn,et al.  Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution , 2007, Genome Biology.

[11]  Roderic D. M. Page,et al.  Modified Mincut Supertrees , 2002, WABI.

[12]  Paola Bonizzoni,et al.  Reconciling a gene tree to a species tree under the duplication cost model , 2005, Theor. Comput. Sci..

[13]  A. D. Gordon,et al.  Obtaining common pruned trees , 1985 .

[14]  D. Bryant Building trees, hunting for trees, and comparing trees : theory and methods in phylogenetic analysis , 1997 .

[15]  Mira V. Han,et al.  Gene Family Evolution across 12 Drosophila Genomes , 2007, PLoS genetics.

[16]  Roderic D. M. Page,et al.  GeneTree: comparing gene and species phylogenies using reconciled trees , 1998, Bioinform..

[17]  Dannie Durand,et al.  Notung: dating gene duplications using gene family trees , 2000, RECOMB '00.

[18]  Amihood Amir,et al.  Maximum Agreement Subtree in a Set of Evolutionary Trees: Metrics and Efficient Algorithms , 1997, SIAM J. Comput..

[19]  Riccardo Dondi,et al.  Minimum Leaf Removal for Reconciliation: Complexity and Algorithms , 2012, CPM.

[20]  Michael T. Hallett,et al.  Simultaneous identification of duplications and lateral transfers , 2004, RECOMB.

[21]  Michael A. Charleston,et al.  Reconciled trees and incongruent gene and species trees , 1996, Mathematical Hierarchies and Biology.

[22]  Michael T. Hallett,et al.  New algorithms for the duplication-loss model , 2000, RECOMB '00.

[23]  Mikkel Thorup,et al.  An O(n log n) algorithm for the maximum agreement subtree problem for binary trees , 1996, SODA '96.

[24]  Dannie Durand,et al.  A hybrid micro-macroevolutionary approach to gene tree reconstruction. , 2006 .

[25]  Nadia El-Mabrouk,et al.  Removing Noise from Gene Trees , 2011, WABI.

[26]  Z. Gu,et al.  Evolutionary analyses of the human genome , 2001, Nature.

[27]  Satish Rao,et al.  Using Max Cut to Enhance Rooted Trees Consistency , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[29]  Raphael Clifford,et al.  Combinatorial Pattern Matching (CPM) , 2011 .

[30]  Jerzy Tiuryn,et al.  DLS-trees: A model of evolutionary scenarios , 2006, Theor. Comput. Sci..

[31]  Dannie Durand,et al.  NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees , 2000, J. Comput. Biol..

[32]  Michael T. Hallett,et al.  Simultaneous Identification of Duplications and Lateral Gene Transfers , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  Charles Semple,et al.  A supertree method for rooted trees , 2000, Discret. Appl. Math..

[34]  Krister M. Swenson,et al.  An Optimal Reconciliation Algorithm for Gene Trees with Polytomies , 2012, WABI.

[35]  James B. Orlin,et al.  A faster algorithm for finding the minimum cut in a graph , 1992, SODA '92.

[36]  Tandy J. Warnow,et al.  Kaikoura Tree Theorems: Computing the Maximum Agreement Subtree , 1993, Inf. Process. Lett..

[37]  Oliver Eulenstein,et al.  Reconciling Gene Trees with Apparent Polytomies , 2006, COCOON.

[38]  Louxin Zhang,et al.  On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies , 1997, J. Comput. Biol..