Polynomial-Time Algorithms for Building a Consensus MUL-Tree

A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host-parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists.

[1]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[2]  Katharina T. Huber,et al.  Metrics on Multilabeled Trees: Interrelationships and Diameter Bounds , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[4]  F. James Rohlf,et al.  Taxonomic Congruence in the Leptopodomorpha Re-examined , 1981 .

[5]  Gareth Nelson,et al.  Systematics and Biogeography: Cladistics and Vicariance , 1981 .

[6]  David Bryant,et al.  A classification of consensus methods for phylogenetics , 2001, Bioconsensus.

[7]  Fred R. McMorris,et al.  Consensusn-trees , 1981 .

[8]  Dan Gusfield,et al.  A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem , 2005, RECOMB.

[9]  Vincent Moulton,et al.  Inferring polyploid phylogenies from multiply-labeled gene trees , 2009, BMC Evolutionary Biology.

[10]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[11]  Michael R. Fellows,et al.  Analogs & duals of the MAST problem for sequences & trees , 2003, J. Algorithms.

[12]  Vincent Berry,et al.  Building species trees from larger parts of phylogenomic databases , 2011, Inf. Comput..

[13]  Katharina T. Huber,et al.  PADRE: a package for analyzing and displaying reticulate evolution , 2009, Bioinform..

[14]  Sabine Storandt,et al.  Computing a Consensus of Multilabeled Trees , 2012, ALENEX.

[15]  Louis J. Gross Algorithms in Bioinformatics: A Practical Introduction , 2009 .

[16]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[17]  Katharina T. Huber,et al.  The Complexity of Deriving Multi-Labeled Trees from Bipartitions , 2008, J. Comput. Biol..

[18]  Sylvain Guillemot,et al.  Computing a Smallest Multilabeled Phylogenetic Tree from Rooted Triplets , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Alfred V. Aho,et al.  Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions , 1981, SIAM J. Comput..

[20]  Peter Sanders,et al.  2013 Proceedings of the Fifteenth Workshop on Algorithm Engineering and Experiments (ALENEX) , 2013 .

[21]  Vincent Moulton,et al.  Reconstructing the evolutionary history of polyploids from multilabeled trees. , 2006, Molecular biology and evolution.

[22]  R. Page Parasites, phylogeny and cospeciation , 1993 .

[23]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[24]  Tandy J. Warnow,et al.  Pattern Identification in Biogeography , 2005, WABI.