Comparing tree shapes: beyond symmetry

This paper describes two types of problems related to tree shapes, as well as algorithms that can be used to solve these problems. The first problem is that of comparing the similarity of the unlabelled shapes instead of merely their degree of balance, in a manner analogous to that routinely used to compare topologies for labelled trees. There are possible practical applications for this comparison, such as determining, based on tree shape similarity alone, whether the taxa in two phylogenies are likely to have a correspondence (e.g. hosts and parasites with high specificity). It is shown that tree balance is insufficient for this task and that standard measures of topological difference (Robinson–Foulds distances, SPR distances or retention indices of the matrices representing the trees, MRPs) can be easily adapted to the problem. The second type of problem is to determine whether taxa of uncertain matching unique to two different phylogenies could correspond to each other (e.g. the same species in larvae and adults of metamorphic animals, fossils known from different body parts). This second problem can be solved by either relabelling taxa in such a way that the number of consensus nodes is maximized, or relabelling taxa in such a way that the sum of the number of steps in the MRP of each tree mapped onto the other is minimum.

[1]  M. Steel,et al.  Distributions of cherries for two models of trees. , 2000, Mathematical biosciences.

[2]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[3]  P. Goloboff,et al.  TNT version 1.5, including a full implementation of phylogenetic morphometrics , 2016, Cladistics : the international journal of the Willi Hennig Society.

[4]  Eric Durand,et al.  apTreeshape: statistical analysis of phylogenetic tree shape , 2006, Bioinform..

[5]  Erik M. Volz,et al.  Modelling tree shape and structure in viral phylodynamics , 2013, Philosophical Transactions of the Royal Society B: Biological Sciences.

[6]  Xiuwei Zhang,et al.  Refining transcriptional regulatory networks using network evolutionary models and gene histories , 2010, Algorithms for Molecular Biology.

[7]  M. J. Sackin,et al.  “Good” and “Bad” Phenograms , 1972 .

[8]  S. Nadler,et al.  Disparate rates of molecular evolution in cospeciating hosts and parasites. , 1994, Science.

[9]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[10]  G. Moore,et al.  An iterative approach from the standpoint of the additive hypothesis to the dendrogram problem posed by molecular data sets. , 1973, Journal of theoretical biology.

[11]  M. Kuhner,et al.  Practical performance of tree comparison metrics. , 2015, Systematic biology.

[12]  P. Goloboff METHODS FOR FASTER PARSIMONY ANALYSIS , 1996 .

[13]  Pablo A. Goloboff,et al.  Minority rule supertrees? MRP, Compatibility, and Minimum Flip may display the least frequent groups , 2005 .

[14]  D. Rosen Vicariant Patterns and Historical Explanation in Biogeography , 1978 .

[15]  D. Robinson,et al.  Comparison of weighted labelled trees , 1979 .

[16]  Pablo A. Goloboff,et al.  TNT, a free program for phylogenetic analysis , 2008 .

[17]  J. Farris On Comparing the Shapes of Taxonomic Trees , 1973 .

[18]  M. Ragan,et al.  Reply to A. G. Rodrigo's "A Comment on Baum's Method for Combining Phylogenetic Trees" , 1993 .

[19]  D. Rabosky Automatic Detection of Key Innovations, Rate Shifts, and Diversity-Dependence on Phylogenetic Trees , 2014, PloS one.

[20]  Diego Pol,et al.  Semi‐strict supertrees , 2002, Cladistics : the international journal of the Willi Hennig Society.

[21]  T. Stadler Recovering speciation and extinction dynamics based on phylogenies , 2013, Journal of evolutionary biology.

[22]  Measuring Topological Congruence by Extending Character Techniques , 1999 .

[23]  Jean Marcel Pallo,et al.  A Distance Metric on Binary Trees Using Lattice-Theoretic Measures , 1990, Inf. Process. Lett..

[24]  Arne Ø. Mooers,et al.  Inferring Evolutionary Process from Phylogenetic Tree Shape , 1997, The Quarterly Review of Biology.

[25]  Marco Salemi,et al.  PhyloTempo: A Set of R Scripts for Assessing and Visualizing Temporal Clustering in Genealogies Inferred from Serially Sampled Viral Sequences , 2012, Evolutionary bioinformatics online.

[26]  J. Farris THE RETENTION INDEX AND THE RESCALED CONSISTENCY INDEX , 1989, Cladistics : the international journal of the Willi Hennig Society.

[27]  Premal Shah,et al.  A PARAMETRIC METHOD FOR ASSESSING DIVERSIFICATION‐RATE VARIATION IN PHYLOGENETIC TREES , 2013, Evolution; international journal of organic evolution.

[28]  P. Goloboff,et al.  Weighting against homoplasy improves phylogenetic analysis of morphological data sets , 2008 .

[29]  Pablo A. Goloboff,et al.  Calculating SPR distances between trees , 2008, Cladistics : the international journal of the Willi Hennig Society.

[30]  E. Wiley Phylogenetics: The Theory and Practice of Phylogenetic Systematics , 1981 .

[31]  Guan-Shieng Huang,et al.  A metric for rooted trees with unlabeled vertices based on nested parentheses , 2010, Theor. Comput. Sci..

[32]  Chad D. Brock,et al.  Nine exceptional radiations plus high turnover explain species diversity in jawed vertebrates , 2009, Proceedings of the National Academy of Sciences.

[33]  Roderic D. M. Page,et al.  TEMPORAL CONGRUENCE REVISITED : COMPARISON OF MITOCHONDRIAL DNA SEQUENCE DIVERGENCE IN COSPECIATING POCKET GOPHERS AND THEIR CHEWING LICE , 1996 .

[34]  David Fernández-Baca,et al.  Robinson-Foulds Supertrees , 2010, Algorithms for Molecular Biology.

[35]  J. Crothers,et al.  Good and the bad , 1986, Nature.

[36]  N. Platnick,et al.  A review of the spider genus Anapis (Araneae, Anapidae), with a dual cladistic analysis. American Museum novitates ; no. 2663 , 1978 .

[37]  Frederick A Matsen,et al.  A geometric approach to tree shape statistics. , 2005, Systematic biology.

[38]  Art F. Y. Poon,et al.  Mapping the Shapes of Phylogenetic Trees from Human and Zoonotic RNA Viruses , 2013, PloS one.

[39]  Eric Lewitus,et al.  Characterizing and comparing phylogenies from their Laplacian spectrum , 2015, bioRxiv.

[40]  Francesc Rosselló,et al.  A new balance index for phylogenetic trees , 2012, Mathematical biosciences.

[41]  K. Bremer COMBINABLE COMPONENT CONSENSUS , 1990, Cladistics : the international journal of the Willi Hennig Society.

[42]  G. Giribet,et al.  TNT: Tree Analysis Using New Technology , 2005 .

[43]  J. Hein Reconstructing evolution of sequences subject to recombination using parsimony. , 1990, Mathematical biosciences.

[44]  Mike A. Steel,et al.  The size of a maximum agreement subtree for random binary trees , 2001, Bioconsensus.

[45]  Jean Marcel Pallo,et al.  Two Shortest Path Metrics on Well-Formed Parentheses Strings , 1996, Inf. Process. Lett..