On a matching distance between rooted phylogenetic trees

Abstract The Robinson-Foulds (RF) distance is the most popular method of evaluating the dissimilarity between phylogenetic trees. In this paper, we define and explore in detail properties of the Matching Cluster (MC) distance, which can be regarded as a refinement of the RF metric for rooted trees. Similarly to RF, MC operates on clusters of compared trees, but the distance evaluation is more complex. Using the graph theoretic approach based on a minimum-weight perfect matching in bipartite graphs, the values of similarity between clusters are transformed to the final MC-score of the dissimilarity of trees. The analyzed properties give insight into the structure of the metric space generated by MC, its relations with the Matching Split (MS) distance of unrooted trees and asymptotic behavior of the expected distance between binary n-leaf trees selected uniformly in both MC and MS (Θ(n3/2)).

[1]  D. Bryant Building trees, hunting for trees, and comparing trees : theory and methods in phylogenetic analysis , 1997 .

[2]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[3]  Bin Ma,et al.  On reconstructing species trees from gene trees in term of duplications and losses , 1998, RECOMB '98.

[4]  G. Valiente,et al.  Metrics for Phylogenetic Networks I: Generalizations of the Robinson-Foulds Metric , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  S. Janson,et al.  The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance , 2006, math/0702415.

[6]  Jianrong Dong,et al.  Comparing and aggregating partially resolved trees , 2008, Theor. Comput. Sci..

[7]  V. Makarenkov,et al.  Inferring and validating horizontal gene transfer events using bipartition dissimilarity. , 2010, Systematic biology.

[8]  Tandy J. Warnow,et al.  An experimental study of Quartets MaxCut and other supertree methods , 2010, Algorithms for Molecular Biology.

[9]  Oliver Eulenstein,et al.  A Robinson-Foulds Measure to Compare Unrooted Trees with Rooted Trees , 2012, ISBRA.

[10]  Douglas E. Critchlow,et al.  THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES , 1996 .

[11]  H. W. Parker,et al.  Systematic Zoology , 1896, Nature.

[12]  Pamela S Soltis,et al.  Darwin's abominable mystery: Insights from a supertree of the angiosperms , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  M. Steel,et al.  Trees from Languages and Genes are Very Similar , 1993 .

[14]  S. Boorman,et al.  Metrics on spaces of finite trees , 1973 .

[15]  Gabriel Cardona,et al.  Nodal distances for rooted phylogenetic trees , 2008, Journal of mathematical biology.

[16]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .

[17]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[18]  Krzysztof Giaro,et al.  TreeCmp: Comparison of Trees in Polynomial Time , 2012, Evolutionary Bioinformatics Online.

[19]  M. Kennedy,et al.  SEABIRD SUPERTREES: COMBINING PARTIAL ESTIMATES OF PROCELLARIIFORM PHYLOGENY , 2002 .

[20]  Tandy J. Warnow,et al.  Statistically based postprocessing of phylogenetic analysis by clustering , 2002, ISMB.

[21]  M. J. Sackin,et al.  “Good” and “Bad” Phenograms , 1972 .

[22]  Dennis Shasha,et al.  Fast Structural Search in Phylogenetic Databases , 2005, Evolutionary bioinformatics online.

[23]  Kate E. Jones,et al.  The delayed rise of present-day mammals , 1990, Nature.

[24]  D. Hillis,et al.  Analysis and visualization of tree space. , 2005, Systematic biology.

[25]  Andrew Walenstein,et al.  Evaluation of malware phylogeny modelling systems using automated variant generation , 2009, Journal in Computer Virology.

[26]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[27]  Ravindra K. Ahuja,et al.  New scaling algorithms for the assignment and minimum mean cycle problems , 1992, Math. Program..

[28]  G. Valiente,et al.  Metrics for Phylogenetic Networks II: Nodal and Triplets Metrics , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  W. T. Williams,et al.  ON THE COMPARISON OF TWO CLASSIFICATIONS OF THE SAME SET OF ELEMENTS , 1971 .

[30]  Tandy J. Warnow,et al.  MRL and SuperFine+MRL: new supertree methods , 2012, Algorithms for Molecular Biology.

[31]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[32]  Gabriel Cardona,et al.  Comparison of Tree-Child Phylogenetic Networks , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  M. Steel,et al.  Distributions of cherries for two models of trees. , 2000, Mathematical biosciences.

[34]  Sebastian Böcker,et al.  Polynomial Supertree Methods Revisited , 2010, PRIB.

[35]  Mariusz Frackiewicz,et al.  KHM clustering technique as a segmentation method for endoscopic colour images , 2011, Int. J. Appl. Math. Comput. Sci..

[36]  Robert E. Tarjan,et al.  Faster Scaling Algorithms for Network Problems , 1989, SIAM J. Comput..

[37]  Anna Gambin,et al.  New Metrics for Phylogenies , 2007, Fundam. Informaticae.

[38]  Rafal Biedrzycki,et al.  KIS: An automated attribute induction method for classification of DNA sequences , 2012, Int. J. Appl. Math. Comput. Sci..

[39]  W. Gilks,et al.  A novel algorithm and web-based tool for comparing two alternative phylogenetic trees , 2006, Bioinform..

[40]  David Fernández-Baca,et al.  Robinson-Foulds Supertrees , 2010, Algorithms for Molecular Biology.

[41]  Vittorio Loreto,et al.  On the Accuracy of Language Trees , 2011, PloS one.

[42]  D. Aldous Stochastic Analysis: The Continuum random tree II: an overview , 1991 .

[43]  Gabriel Cardona,et al.  An algebraic metric for phylogenetic trees , 2009, Appl. Math. Lett..

[44]  Krzysztof Giaro,et al.  Matching Split Distance for Unrooted Binary Phylogenetic Trees , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[45]  Alain Guénoche,et al.  TreeOfTrees Method to Evaluate the Congruence Between Gene Trees , 2011, J. Classif..

[46]  Serdar Tasiran,et al.  TreeJuxtaposer: scalable tree comparison using Focus+Context with guaranteed visibility , 2003, ACM Trans. Graph..

[47]  Tandy J. Warnow,et al.  An Experimental Study of Quartets MaxCut and Other Supertree Methods , 2010, WABI.

[48]  Yu Lin,et al.  A Metric for Phylogenetic Trees Based on Matching , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[49]  GusfieldDan Introduction to the IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2004 .

[50]  Guillermo Restrepo,et al.  Three Dissimilarity Measures to Contrast Dendrograms , 2007, J. Chem. Inf. Model..