On Defining and Finding Islands of Trees and Mitigating Large Island Bias

Abstract How best can we summarize sets of phylogenetic trees? Systematists have relied heavily on consensus methods, but if tree distributions can be partitioned into distinct subsets, it may be helpful to provide separate summaries of these rather than relying entirely upon a single consensus tree. How sets of trees can most helpfully be partitioned and represented leads to many open questions, but one natural partitioning is provided by the islands of trees found during tree searches. Islands that are of dissimilar size have been shown to yield majority-rule consensus trees dominated by the largest sets We illustrate this large island bias and approaches that mitigate its impact by revisiting a recent analysis of phylogenetic relationships of living and fossil amphibians. We introduce a revised definition of tree islands based on any tree-to-tree pairwise distance metric that usefully extends the notion to any set or multiset of trees, as might be produced by, for example, Bayesian or bootstrap methods, and that facilitates finding tree islands a posteriori. We extract islands from a tree distribution obtained in a Bayesian analysis of the amphibian data to investigate their impact in that context, and we compare the partitioning produced by tree islands with those resulting from some alternative approaches. Distinct subsets of trees, such as tree islands, should be of interest because of what they may reveal about evolution and/or our attempts to understand it, and are an important, sometimes overlooked, consideration when building and interpreting consensus trees. [Amphibia; Bayesian inference; consensus; parsimony; partitions; phylogeny; Chinlestegophis.]

[1]  D. Soltis,et al.  DISCORDANCE BETWEEN NUCLEAR AND CHLOROPLAST PHYLOGENIES IN THE HEUCHERA GROUP (SAXIFRAGACEAE) , 1995, Evolution; international journal of organic evolution.

[2]  Mark Wilkinson,et al.  Sphenodontid phylogeny and the problems of multiple trees , 1996 .

[3]  C. Brochu,et al.  Global lability, regional resolution, and majority-rule consensus bias , 2001, Paleobiology.

[4]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[5]  D. Ord,et al.  PAUP:Phylogenetic analysis using parsi-mony , 1993 .

[6]  D. Bryant The Splits in the Neighborhood of a Tree , 2004 .

[7]  Arndt von Haeseler,et al.  Consequences of Common Topological Rearrangements for Partition Trees in Phylogenomic Inference , 2015, J. Comput. Biol..

[8]  Alain Guénoche,et al.  TreeOfTrees Method to Evaluate the Congruence Between Gene Trees , 2011, J. Classif..

[9]  Tandy J. Warnow,et al.  Statistically based postprocessing of phylogenetic analysis by clustering , 2002, ISMB.

[10]  David Penny,et al.  Comparing Trees with Pendant Vertices Labelled , 1984 .

[11]  F. Jenkins,et al.  The Braincase of Eocaecilia micropodia (Lissamphibia, Gymnophiona) and the Origin of Caecilians , 2012, PloS one.

[12]  Thibaut Jombart,et al.  adegenet: a R package for the multivariate analysis of genetic markers , 2008, Bioinform..

[13]  D. W. Taylor,et al.  Scutifolium jordanicum gen. et sp. nov. (Cabombaceae), an aquatic fossil plant from the Lower Cretaceous of Jordan, and the relationships of related leaf fossils to living genera. , 2008, American journal of botany.

[14]  Pablo A. Goloboff,et al.  Calculating SPR distances between trees , 2008, Cladistics : the international journal of the Willi Hennig Society.

[15]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[16]  IV FrederickA.Matsen,et al.  Efficiently Inferring Pairwise Subtree Prune-and-Regraft Adjacencies between Phylogenetic Trees , 2016, ANALCO.

[17]  Hans-Hermann Bock,et al.  Classification and Related Methods of Data Analysis , 1988 .

[18]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[19]  M. Laurin,et al.  Phylogeny of Paleozoic limbed vertebrates reassessed through revision and expansion of the largest published relevant data matrix , 2019, PeerJ.

[20]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[21]  Mohammed El-Kebir,et al.  Summarizing the solution space in tumor phylogeny inference by multiple consensus trees , 2019, Bioinform..

[22]  T. M. Nye Trees of trees: an approach to comparing multiple alternative phylogenies. , 2008, Systematic biology.

[23]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[24]  Fidel Ramírez,et al.  Computing topological parameters of biological networks , 2008, Bioinform..

[25]  S. Voigt,et al.  A Triassic stem-salamander from Kyrgyzstan and the origin of salamanders , 2020, Proceedings of the National Academy of Sciences.

[26]  W. H. Day,et al.  A computationally efficient approximation to the nearest neighbor interchange metric , 1984 .

[27]  M. Wilkinson Common Cladistic Information and its Consensus Representation: Reduced Adams and Reduced Cladistic Consensus Trees and Profiles , 1994 .

[28]  Ichael,et al.  Analysis of Character Correlations Among Wood Decay Mechanisms , Mating Systems , and Substrate Ranges in Homobasidiomycetes , 2001 .

[29]  B. Mohr,et al.  Jaguariba wiersemana gen. nov. et sp. nov., an Early Cretaceous member of crown group Nymphaeales (Nymphaeaceae) from northern Gondwana , 2013 .

[30]  H. Jeffreys A Treatise on Probability , 1922, Nature.

[31]  Vladimir Makarenkov,et al.  A new fast method for inferring multiple consensus trees using k-medoids , 2018, BMC Evolutionary Biology.

[32]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[33]  K. Nixon,et al.  ON CONSENSUS, COLLAPSIBILITY, AND CLADE CONCORDANCE , 1996, Cladistics : the international journal of the Willi Hennig Society.

[34]  Vincent Berry,et al.  Multipolar consensus for phylogenetic trees. , 2006, Systematic biology.

[35]  S. Graham,et al.  Phylogenetic congruence and discordance among one morphological and three molecular data sets from Pontederiaceae. , 1998, Systematic biology.

[36]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[37]  Michael J. Sharkey,et al.  Weighted compromise trees: a method to summarize competing phylogenetic hypotheses , 2013 .

[38]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[39]  M. Coates,et al.  Dates, nodes and character conflict: Addressing the Lissamphibian origin problem , 2007 .

[40]  Mark Wilkinson,et al.  Majority-rule supertrees. , 2007, Systematic biology.

[41]  F. James Rohlf,et al.  Taxonomic Congruence in the Leptopodomorpha Re-examined , 1981 .

[42]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[43]  M. Steel,et al.  Impacts of Terraces on Phylogenetic Inference. , 2014, Systematic biology.

[44]  Panos M. Pardalos,et al.  Discrete Mathematical Problems with Medical Applications , 2000 .

[45]  A. Huttenlocker,et al.  Stem caecilian from the Triassic of Colorado sheds light on the origins of Lissamphibia , 2017, Proceedings of the National Academy of Sciences.

[46]  D. Maddison The discovery and importance of multiple islands of most , 1991 .

[47]  Xin He,et al.  On computing the nearest neighbor interchange distance , 1999, Discrete Mathematical Problems with Medical Applications.

[48]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[49]  Alexei J Drummond,et al.  Guided tree topology proposals for Bayesian phylogenetic inference. , 2012, Systematic biology.

[50]  J. Palmer,et al.  Chloroplast DNA systematics: a review of methods and data analysis , 1994 .

[51]  Layla Oesper,et al.  A Consensus Approach to Infer Tumor Evolutionary Histories , 2018, BCB.

[52]  Alain Guénoche Multiple consensus trees: a method to separate divergent genes , 2012, BMC Bioinformatics.

[53]  G. Gunnell,et al.  33 million year old Myotis (Chiroptera, Vespertilionidae) and the rapid global radiation of modern bats , 2017, PloS one.

[54]  Michael J. Sharkey,et al.  Majority Does Not Rule: The Trouble with Majority‐Rule Consensus Trees , 2001, Cladistics : the international journal of the Willi Hennig Society.

[55]  J. Palmer,et al.  A parsimony analysis of the Asteridae sensu lato based on rbcL sequences. , 1993 .

[56]  Mike Steel,et al.  Terraces in Phylogenetic Tree Space , 2011, Science.