Species tree estimation from multiple markers is complicated by the fact that gene trees can differ from each other (and from the true species tree) due to several biological processes, one of which is gene duplication and loss. Local search heuristics for two NP-hard optimization problems - minimize gene duplications (MGD) and minimize gene duplications and losses (MGDL) - are popular techniques for estimating species trees in the presence of gene duplication and loss. In this paper, we present an alternative approach to solving MGD and MGDL from rooted gene trees. First, we characterize each tree in terms of its "subtree-bipartitions" (a concept we introduce). Then we show that the MGD species tree is defined by a maximum weight clique in a vertex-weighted graph that can be computed from the subtree-bipartitions of the input gene trees, and the MGDL species tree is defined by a minimum weight clique in a similarly constructed graph. We also show that these optimal cliques can be found in polynomial time in the number of vertices of the graph using a dynamic programming algorithm (similar to that of Hallett and Lagergren(1)), because of the special structure of the graphs. Finally, we show that a constrained version of these problems, where the subtree-bipartitions of the species tree are drawn from the subtree-bipartitions of the input gene trees, can be solved in time that is polynomial in the number of gene trees and taxa. We have implemented our dynamic programming algorithm in a publicly available software tool, available at http://www.cs.utexas.edu/users/phylo/software/dynadup/.
[1]
Tandy J. Warnow,et al.
Algorithms for MDC-Based Multi-Locus Phylogeny Inference: Beyond Rooted Binary Gene Trees on Single Alleles
,
2011,
J. Comput. Biol..
[2]
Ulrike Stege,et al.
Gene Trees and Species Trees: The Gene-Duplication Problem in Fixed-Parameter Tractable
,
1999,
WADS.
[3]
Pawel Górecki,et al.
Reconciliation problems for duplication, loss and horizontal gene transfer
,
2004,
RECOMB.
[4]
Roderic D. M. Page,et al.
Vertebrate Phylogenomics: Reconciled Trees and Gene Duplications
,
2001,
Pacific Symposium on Biocomputing.
[5]
Michael T. Hallett,et al.
New algorithms for the duplication-loss model
,
2000,
RECOMB '00.
[6]
Hamid R. Arabnia,et al.
Software Tools and Algorithms for Biological Systems
,
2013
.
[7]
Bin Ma,et al.
On reconstructing species trees from gene trees in term of duplications and losses
,
1998,
RECOMB '98.