Computing a Consensus of Multilabeled Trees

In this paper we consider two challenging problems that arise in the context of computing a consensus of a collection of multilabeled trees, namely (1) selecting a compatible collection of clusters on a multiset from an ordered list of such clusters and (2) optimally refining high degree vertices in a multilabeled tree. Forming such a consensus is part of an approach to reconstruct the evolutionary history of a set of species for which events such as genome duplication and hybridization have occurred in the past. We present exact algorithms for solving (1) and (2) that have an exponential run-time in the worst case. To give some impression of their performance in practice, we apply them to simulated input and to a real biological data set highlighting the impact of several structural properties of the input on the performance.

[1]  Vincent Moulton,et al.  Inferring polyploid phylogenies from multiply-labeled gene trees , 2009, BMC Evolutionary Biology.

[2]  Bengt Oxelman,et al.  Origin and Evolution of a Circumpolar Polyploid Species Complex in Silene (Caryophyllaceae) Inferred from Low Copy Nuclear RNA Polymerase Introns, rDNA, and Chloroplast DNA , 2005 .

[3]  Sylvain Guillemot,et al.  Computing a Smallest Multilabeled Phylogenetic Tree from Rooted Triplets , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  G. Yule,et al.  A Mathematical Theory of Evolution Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[5]  Torsten Eriksson,et al.  Allopolyploid evolution in Geinae (Colurieae: Rosaceae) – building reticulate species trees from bifurcating gene trees , 2005 .

[6]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[7]  E. Harding The probabilities of rooted tree-shapes generated by random bifurcation , 1971, Advances in Applied Probability.

[8]  Vincent Berry,et al.  From Gene Trees to Species Trees through a Supertree Approach , 2009, LATA.

[9]  Katharina T. Huber,et al.  The Complexity of Deriving Multi-Labeled Trees from Bipartitions , 2008, J. Comput. Biol..

[10]  Tandy J. Warnow,et al.  Pattern Identification in Biogeography , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Bengt Oxelman,et al.  Origin and evolution of North American polyploid Silene (Caryophyllaceae). , 2007, American journal of botany.

[12]  J. Bowman,et al.  Allopolyploidization and evolution of species with reduced floral structures in Lepidium L. (Brassicaceae) , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[13]  K. T. Huber,et al.  Phylogenetic networks from multi-labelled trees , 2006, Journal of mathematical biology.