A Metric on Phylogenetic Tree Shapes

Abstract. The shapes of evolutionary trees are influenced by the nature of the evolutionary process but comparisons of trees from different processes are hindered by the challenge of completely describing tree shape. We present a full characterization of the shapes of rooted branching trees in a form that lends itself to natural tree comparisons. We use this characterization to define a metric, in the sense of a true distance function, on tree shapes. The metric distinguishes trees from random models known to produce different tree shapes. It separates trees derived from tropical versus USA influenza A sequences, which reflect the differing epidemiology of tropical and seasonal flu. We describe several metrics based on the same core characterization, and illustrate how to extend the metric to incorporate trees' branch lengths or other features such as overall imbalance. Our approach allows us to construct addition and multiplication on trees, and to create a convex metric on tree shapes which formally allows computation of average tree shapes.

[1]  Eric Durand,et al.  apTreeshape: statistical analysis of phylogenetic tree shape , 2006, Bioinform..

[2]  D. Rosen Vicariant Patterns and Historical Explanation in Biogeography , 1978 .

[3]  Marcos Dipinto,et al.  Discriminant analysis , 2020, Predictive Analytics.

[4]  Jennifer Jackson,et al.  Good and Bad , 1992 .

[5]  Noah A. Rosenberg,et al.  The Mean and Variance of the Numbers of r-Pronged Nodes and r-Caterpillars in Yule-Generated Genealogical Trees , 2006 .

[6]  Kellogg S. Booth,et al.  A Linear Time Algorithm for Deciding Interval Graph Isomorphism , 1979, JACM.

[7]  M. Slatkin,et al.  SEARCHING FOR EVOLUTIONARY PATTERNS IN THE SHAPE OF A PHYLOGENETIC TREE , 1993, Evolution; international journal of organic evolution.

[8]  Susanna C. Manrubia,et al.  Topological properties of phylogenetic trees in evolutionary models , 2009 .

[9]  Michelle Kendall,et al.  Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution , 2015, bioRxiv.

[10]  Giuseppe Fusco,et al.  A new method for evaluating the shape of large phylogenies , 1995 .

[11]  Robert E. Tarjan,et al.  Isomorphism of Planar Graphs , 1972, Complexity of Computer Computations.

[12]  Colin A. Russell,et al.  The Global Circulation of Seasonal Influenza A (H3N2) Viruses , 2008, Science.

[13]  Art F. Y. Poon,et al.  Mapping the Shapes of Phylogenetic Trees from Human and Zoonotic RNA Viruses , 2013, PloS one.

[14]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[15]  D. Aldous Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today , 2001 .

[16]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[17]  D. H. Colless,et al.  RELATIVE SYMMETRY OF CLADOGRAMS AND PHENOGRAMS : AN EXPERIMENTAL STUDY , 1995 .

[18]  Frederick Albert Matsen IV,et al.  Optimization Over a Class of Tree Shape Statistics , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Giacomo Plazzotta,et al.  Asymptotic frequency of shapes in supercritical branching trees , 2015, Journal of Applied Probability.

[20]  Mark Kirkpatrick,et al.  DO PHYLOGENETIC METHODS PRODUCE TREES WITH BIASED SHAPES? , 1996, Evolution; international journal of organic evolution.

[21]  Ed Stam,et al.  DOES IMBALANCE IN PHYLOGENIES REFLECT ONLY BIAS? , 2002, Evolution; international journal of organic evolution.

[22]  Ted Cohen,et al.  The dynamics of sexual contact networks: effects on disease spread and control. , 2012, Theoretical population biology.

[23]  Thomas B. Kepler,et al.  A two-tiered model for simulating the ecological and evolutionary dynamics of rapidly evolving viruses, with an application to influenza , 2010, Journal of The Royal Society Interface.

[24]  Eric Lewitus,et al.  Characterizing and comparing phylogenies from their Laplacian spectrum , 2015, bioRxiv.

[25]  Jennifer Gardy,et al.  Phylogenetic tree shapes resolve disease transmission patterns , 2014, bioRxiv.

[26]  Trevor Bedford,et al.  Global circulation patterns of seasonal influenza viruses vary with antigenic drift , 2015, Nature.

[27]  Huldrych F. Günthard,et al.  Inferring Epidemic Contact Structure from Phylogenetic Trees , 2012, PLoS Comput. Biol..

[28]  Jukka Corander,et al.  Dense genomic sampling identifies highways of pneumococcal recombination , 2014, Nature Genetics.

[29]  Charles J. Colbourn,et al.  Linear Time Automorphism Algorithms for Trees, Interval Graphs, and Planar Graphs , 1981, SIAM J. Comput..

[30]  Gavin J. D. Smith,et al.  Genetic evolution of the neuraminidase of influenza A (H3N2) viruses from 1968 to 2009 and its correspondence to haemagglutinin evolution. , 2012, The Journal of general virology.

[31]  S. Janson,et al.  The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance , 2006, math/0702415.

[32]  Andy Purvis,et al.  Power of eight tree shape statistics to detect nonrandom diversification: a comparison by simulation of two models of cladogenesis. , 2002, Systematic biology.

[33]  O. Gascuel Evidence for a Relationship Between Algorithmic Scheme and Shape of Inferred Trees , 2000 .

[34]  Olivier François,et al.  Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance. , 2006, Systematic biology.

[35]  D. Aldous PROBABILITY DISTRIBUTIONS ON CLADOGRAMS , 1996 .

[36]  Amaury Lambert,et al.  Phylogenies support out-of-equilibrium models of biodiversity. , 2015, Ecology letters.

[37]  Vittorio Loreto,et al.  Phylogenetic Properties of RNA Viruses , 2012, PloS one.

[38]  Frederick A Matsen,et al.  A geometric approach to tree shape statistics. , 2005, Systematic biology.

[39]  Beda Joos,et al.  Estimating the basic reproductive number from viral sequence data. , 2012, Molecular biology and evolution.

[40]  Adam P. Arkin,et al.  FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix , 2009, Molecular biology and evolution.

[41]  J. Fontanari,et al.  Effect of selection on the topology of genealogical trees. , 2004, Journal of theoretical biology.

[42]  Carsten Wiuf,et al.  Gene Genealogies, Variation and Evolution - A Primer in Coalescent Theory , 2004 .

[43]  Joseph B. Slowinski,et al.  PROBABILITIES OF n-TREES UNDER TWO MODELS: A DEMONSTRATION THAT ASYMMETRICAL INTERIOR NODES ARE NOT IMPROBABLE , 1990 .

[44]  Trevor Bedford,et al.  Viral Phylodynamics , 2013, PLoS Comput. Biol..

[45]  Katharina T. Huber,et al.  Metrics on Multilabeled Trees: Interrelationships and Diameter Bounds , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[46]  M. Steel,et al.  Distributions of cherries for two models of trees. , 2000, Mathematical biosciences.

[47]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[48]  B. Shraiman,et al.  How to Infer Relative Fitness from a Sample of Genomic Sequences , 2012, Genetics.

[49]  Louis J. Billera,et al.  Geometry of the Space of Phylogenetic Trees , 2001, Adv. Appl. Math..

[50]  Kwok Pui Choi,et al.  On joint subtree distributions under two evolutionary models. , 2015, Theoretical population biology.

[51]  A. Lambert,et al.  Birth-death models and coalescent point processes: the shape and probability of reconstructed phylogenies. , 2013, Theoretical population biology.

[52]  Richard Grenyer,et al.  The shape of mammalian phylogeny: patterns, processes and scales , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[53]  Tanja Stadler,et al.  Insights into the Early Epidemic Spread of Ebola in Sierra Leone Provided by Viral Sequence Data , 2014, PLoS currents.

[54]  G. Furnas The generation of random, binary unordered trees , 1984 .

[55]  M. Lässig,et al.  A predictive fitness model for influenza , 2014, Nature.

[56]  M. J. Sackin,et al.  “Good” and “Bad” Phenograms , 1972 .

[57]  Arne Ø. Mooers,et al.  Inferring Evolutionary Process from Phylogenetic Tree Shape , 1997, The Quarterly Review of Biology.

[58]  C. Guyer,et al.  COMPARISONS OF OBSERVED PHYLOGENETIC TOPOLOGIES WITH NULL EXPECTATIONS AMONG THREE MONOPHYLETIC LINEAGES , 1991, Evolution; international journal of organic evolution.

[59]  Caroline Colijn,et al.  Effects of memory on the shapes of simple outbreak trees , 2016, Scientific Reports.

[60]  F. Balloux,et al.  Discriminant analysis of principal components: a new method for the analysis of genetically structured populations , 2010, BMC Genetics.

[61]  C. Guyer,et al.  ADAPTIVE RADIATION AND THE TOPOLOGY OF LARGE PHYLOGENIES , 1993, Evolution; international journal of organic evolution.