treespace: Statistical exploration of landscapes of phylogenetic trees

The increasing availability of large genomic data sets as well as the advent of Bayesian phylogenetics facilitates the investigation of phylogenetic incongruence, which can result in the impossibility of representing phylogenetic relationships using a single tree. While sometimes considered as a nuisance, phylogenetic incongruence can also reflect meaningful biological processes as well as relevant statistical uncertainty, both of which can yield valuable insights in evolutionary studies. We introduce a new tool for investigating phylogenetic incongruence through the exploration of phylogenetic tree landscapes. Our approach, implemented in the R package treespace, combines tree metrics and multivariate analysis to provide low‐dimensional representations of the topological variability in a set of trees, which can be used for identifying clusters of similar trees and group‐specific consensus phylogenies. treespace also provides a user‐friendly web interface for interactive data analysis and is integrated alongside existing standards for phylogenetics. It fills a gap in the current phylogenetics toolbox in R and will facilitate the investigation of phylogenetic results.

[1]  Wen Huang,et al.  Visualizing phylogenetic tree landscapes , 2017, BMC Bioinformatics.

[2]  Jeremy M. Brown,et al.  TreeScaper: Visualizing and Extracting Phylogenetic Signal from Sets of Trees. , 2016, Molecular biology and evolution.

[3]  Hilmar Lapp,et al.  apex: phylogenetics with multiple genes , 2016, Molecular ecology resources.

[4]  M. Kendall,et al.  Statistical Exploration of Landscapes of Phylogenetic Trees , 2015 .

[5]  M. Kendall,et al.  Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution , 2015, bioRxiv.

[6]  Michelle Kendall,et al.  Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution , 2015, bioRxiv.

[7]  Anup Som,et al.  Causes, consequences and solutions of phylogenetic incongruence , 2015, Briefings Bioinform..

[8]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[9]  Tom M. W. Nye,et al.  An Algorithm for Constructing Principal Geodesics in Phylogenetic Treespace , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  R. Bouckaert,et al.  Looking for trees in the forest: summary tree from posterior samples , 2013, BMC Evolutionary Biology.

[11]  Julian Parkhill,et al.  A genomic portrait of the emergence, evolution, and global spread of a methicillin-resistant Staphylococcus aureus pandemic , 2013, Genome research.

[12]  T. Jombart,et al.  How to measure and test phylogenetic signal , 2012 .

[13]  Liam J. Revell,et al.  phytools: an R package for phylogenetic comparative biology (and other things) , 2012 .

[14]  Sergei L. Kosakovsky Pond,et al.  Statistics and truth in phylogenomics. , 2012, Molecular biology and evolution.

[15]  Tom M. W. Nye,et al.  Principal components analysis in the space of phylogenetic trees , 2011, 1202.5132.

[16]  Mike Steel,et al.  Terraces in Phylogenetic Tree Space , 2011, Science.

[17]  O. Gascuel,et al.  Survey of Branch Support Methods Demonstrates Accuracy, Power, and Robustness of Fast Likelihood-based Approximation Schemes , 2011, Systematic biology.

[18]  K. Schliep Bioinformatics Applications Note Phylogenetics Phangorn: Phylogenetic Analysis in R , 2010 .

[19]  Thibaut Jombart,et al.  Bioinformatics Applications Note Phylogenetics Adephylo: New Tools for Investigating the Phylogenetic Signal in Biological Traits , 2022 .

[20]  T. Jombart,et al.  Putting phylogeny into the analysis of biological traits: a methodological approach. , 2010, Journal of theoretical biology.

[21]  Susan Holmes,et al.  Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[22]  Campbell O. Webb,et al.  Picante: R tools for integrating phylogenies and ecology , 2010, Bioinform..

[23]  Remco R. Bouckaert,et al.  DensiTree: making sense of sets of phylogenetic trees , 2010, Bioinform..

[24]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[25]  Andrew Rambaut,et al.  Evolutionary analysis of the dynamics of viral infectious disease , 2009, Nature Reviews Genetics.

[26]  Gabor Pataki,et al.  A Principal Component Analysis for Trees , 2008, 0810.0944.

[27]  J. McInerney,et al.  The prokaryotic tree of life: past, present... and future? , 2008, Trends in ecology & evolution.

[28]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[29]  Sandrine Pavoine,et al.  Testing for phylogenetic signal in phenotypic traits: new matrices of phylogenetic proximities. , 2008, Theoretical population biology.

[30]  Cécile Viboud,et al.  Multiple Reassortment Events in the Evolutionary History of H1N1 Influenza A Virus Since 1918 , 2008, PLoS pathogens.

[31]  Matthew E Hudson,et al.  Sequencing breakthroughs for genomic ecology and evolutionary biology , 2008, Molecular ecology resources.

[32]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[33]  Anne-Béatrice Dufour,et al.  The ade4 Package: Implementing the Duality Diagram for Ecologists , 2007 .

[34]  Alan M. Moses,et al.  Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for Incomplete Lineage Sorting , 2006, PLoS genetics.

[35]  F. Delsuc,et al.  Phylogenomics: the beginning of incongruence? , 2006, Trends in genetics : TIG.

[36]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[37]  D. Hillis,et al.  Analysis and visualization of tree space. , 2005, Systematic biology.

[38]  F. Delsuc,et al.  Phylogenomics and the reconstruction of the tree of life , 2005, Nature Reviews Genetics.

[39]  Sudhindra R Gadagkar,et al.  Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. , 2005, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[40]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[41]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[42]  S. Holmes,et al.  Bootstrapping Phylogenetic Trees: Theory and Methods , 2003 .

[43]  Douglas E. Soltis,et al.  Applying the Bootstrap in Phylogeny Reconstruction , 2003 .

[44]  Wen-Hsiung Li,et al.  Molecular evolution meets the genomics revolution , 2003, Nature Genetics.

[45]  Susan Holmes,et al.  Statistics for phylogenetic trees. , 2003, Theoretical population biology.

[46]  Nina Amenta,et al.  Case study: visualizing sets of evolutionary trees , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[47]  R. Weinshilboum The genomic revolution and medicine. , 2002, Mayo Clinic proceedings.

[48]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[49]  B Rannala,et al.  Accommodating phylogenetic uncertainty in evolutionary studies. , 2000, Science.

[50]  R. Lanciotti,et al.  Molecular evolution and phylogeny of dengue-4 viruses. , 1997, The Journal of general virology.

[51]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..

[52]  Douglas E. Critchlow,et al.  THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES , 1996 .

[53]  Michael A. Newton,et al.  Bootstrapping phylogenies: Large deviations and dispersion effects , 1996 .

[54]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[55]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .

[56]  D. Maddison The discovery and importance of multiple islands of most , 1991 .

[57]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[58]  Fred R. McMorris,et al.  COMPARISON OF UNDIRECTED PHYLOGENETIC TREES BASED ON SUBTREES OF FOUR EVOLUTIONARY UNITS , 1985 .

[59]  Francis Cailliez,et al.  The analytical solution of the additive constant problem , 1983 .

[60]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[61]  W. T. Williams,et al.  ON THE COMPARISON OF TWO CLASSIFICATIONS OF THE SAME SET OF ELEMENTS , 1971 .

[62]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[63]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[64]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[65]  E. Paradis,et al.  Bioinformatics Applications Note Phylogenetics Ape 3.0: New Tools for Distance-based Phylogenetics and Evolutionary Analysis in R , 2022 .

[66]  B. Wróbel Statistical measures of uncertainty for branches in phylogenetic trees inferred from molecular sequences by using model-based methods , 2010, Journal of Applied Genetics.

[67]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[68]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[69]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[70]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[71]  Louis J. Billera,et al.  Geometry of the Space of Phylogenetic Trees , 2001, Adv. Appl. Math..

[72]  Michael A. Charleston,et al.  Reconciled trees and incongruent gene and species trees , 1996, Mathematical Hierarchies and Biology.

[73]  D. Robinson,et al.  Comparison of weighted labelled trees , 1979 .

[74]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .