Parsimony analysis of phylogenomic datasets (I): scripts and guidelines for using TNT (Tree Analysis using New Technology)

We discuss here the use of TNT (Tree Analysis using New Technology) for phylogenomic analysis. For such data, parsimony is a useful alternative to model‐based analyses, which frequently utilize models that make unrealistic assumptions (e.g. low heterotachy), struggle with high levels of missing data, etc. Parsimony and model‐based methods often yield trees with few topological differences, which can then be analyzed further in order to investigate whether these few topological differences are due to undesirable analysis artefacts. This is facilitated by the greater speed and computational efficiency of parsimony, which allow for a more in‐depth analysis of datasets. We here briefly describe the computationally most efficient and versatile parsimony software, TNT, which can be used for phylogenetic and phylogenomic analyses. In particular, we describe and provide a series of scripts that are specifically designed for the analysis of phylogenomic datasets. This includes scripts for concatenation of gene data files in different formats, generation of plots and datasets with different levels of gene/taxon occupancy, calculation of different support measures and phylogenetic reconstruction based on concatenated matrices and single genes. The execution of the scripts is also demonstrated with video clips (https://www.youtube.com/channel/UCpIgK8sVH‐yK0Bo3fK62IxA). Lastly, we describe the main commands and functions that enable efficient phylogenomic analyses in TNT.

[1]  G. Giribet,et al.  TNT: Tree Analysis Using New Technology , 2005 .

[2]  J. Farris Methods for Computing Wagner Trees , 1970 .

[3]  Pablo A Goloboff,et al.  A phylogenetic C interpreter for TNT , 2020, Bioinform..

[4]  R. DeSalle,et al.  Random Addition Concatenation Analysis: A Novel Approach to the Exploration of Phylogenomic Signal Reveals Strong Agreement between Core and Shell Genomic Partitions in the Cyanobacteria , 2011, Genome biology and evolution.

[5]  J. Hein Reconstructing evolution of sequences subject to recombination using parsimony. , 1990, Mathematical biosciences.

[6]  Liang Liu,et al.  Estimating species trees from unrooted gene trees. , 2011, Systematic biology.

[7]  P. Goloboff,et al.  Assessing topological congruence among concatenation-based phylogenomic approaches in empirical datasets. , 2021, Molecular phylogenetics and evolution.

[8]  J. Farris,et al.  PARSIMONY JACKKNIFING OUTPERFORMS NEIGHBOR‐JOINING , 1996, Cladistics : the international journal of the Willi Hennig Society.

[9]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[10]  Bonnie B. Blaimer,et al.  Ultra-Conserved Elements and morphology reciprocally illuminate conflicting phylogenetic hypotheses in Chalcididae (Hymenoptera, Chalcidoidea) , 2019, bioRxiv.

[11]  Joseph Felsenstein,et al.  Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters , 1973 .

[12]  P. Goloboff,et al.  Morphological Data Sets Fit a Common Mechanism Much More Poorly than DNA Sequences and Call Into Question the Mkv Model , 2018, Systematic biology.

[13]  Sudhindra R Gadagkar,et al.  Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. , 2005, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[14]  K. Bremer,et al.  BRANCH SUPPORT AND TREE STABILITY , 1994 .

[15]  P. Goloboff,et al.  Continuous characters analyzed as such , 2006 .

[16]  P. Goloboff,et al.  Phylogenetic morphometrics (I): the use of landmark data in a phylogenetic framework , 2010, Cladistics : the international journal of the Willi Hennig Society.

[17]  Matthew W. Hahn,et al.  Why Concatenation Fails Near the Anomaly Zone , 2018, Systematic biology.

[18]  M. Siddall Unringing a bell: metazoan phylogenomics and the partition bootstrap , 2009, Cladistics : the international journal of the Willi Hennig Society.

[19]  J. Farris A Successive Approximations Approach to Character Weighting , 1969 .

[20]  D. Soltis,et al.  Amborella not a "basal angiosperm"? Not so fast. , 2004, American journal of botany.

[21]  Alexey M. Kozlov,et al.  ExaML version 3: a tool for phylogenomic analyses on supercomputers , 2015, Bioinform..

[22]  J. Bergsten A review of long‐branch attraction , 2005, Cladistics : the international journal of the Willi Hennig Society.

[23]  Tandy J. Warnow,et al.  ASTRAL: genome-scale coalescent-based species tree estimation , 2014, Bioinform..

[24]  Mark P. Simmons,et al.  Misleading results of likelihood‐based phylogenetic analyses in the presence of missing data , 2012 .

[25]  P. Goloboff ESTIMATING CHARACTER WEIGHTS DURING TREE SEARCH , 1993, Cladistics : the international journal of the Willi Hennig Society.

[26]  Luo Shunlong,et al.  Probability distribution of , 1997 .

[27]  Andrew V. Z. Brower,et al.  Do model‐based phylogenetic analyses perform better than parsimony? A test with empirical data , 2011, Cladistics : the international journal of the Willi Hennig Society.

[28]  Pablo A. Goloboff,et al.  Minority rule supertrees? MRP, Compatibility, and Minimum Flip may display the least frequent groups , 2005 .

[29]  Dan Liang,et al.  Phylogenomic Resolution of the Phylogeny of Laurasiatherian Mammals: Exploring Phylogenetic Signals within Coding and Noncoding Sequences , 2017, Genome biology and evolution.

[30]  M J Sanderson,et al.  Parametric phylogenetics? , 2000, Systematic biology.

[31]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[32]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[33]  J. Farris,et al.  Quantitative Phyletics and the Evolution of Anurans , 1969 .

[34]  K. Nixon The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis , 1999 .

[35]  P. Goloboff,et al.  Weighted parsimony outperforms other methods of phylogenetic inference under models appropriate for morphology , 2018, Cladistics : the international journal of the Willi Hennig Society.

[36]  A. Polanowski,et al.  A phylogenomic analysis of the role and timing of molecular adaptation in the aquatic transition of cetartiodactyl mammals , 2015, Royal Society Open Science.

[37]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[38]  Ultra‐Conserved Elements and morphology reciprocally illuminate conflicting phylogenetic hypotheses in Chalcididae (Hymenoptera, Chalcidoidea) , 2020, Cladistics : the international journal of the Willi Hennig Society.

[39]  V. Goremykin,et al.  Analysis of the Amborella trichopoda chloroplast genome sequence suggests that amborella is not a basal angiosperm. , 2003, Molecular biology and evolution.

[40]  B. Rannala,et al.  Molecular phylogenetics: principles and practice , 2012, Nature Reviews Genetics.

[41]  M. Newton,et al.  Phylogenetic Inference for Binary Data on Dendograms Using Markov Chain Monte Carlo , 1997 .

[42]  R. Meier,et al.  Software Review , 2005 .

[43]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[44]  P. Goloboff Self‐Weighted Optimization: Tree Searches and Character State Reconstructions under Implied Transformation Costs , 1997 .

[45]  Pablo A. Goloboff,et al.  Self-Weighted Optimization: Tree Searches and Character State Reconstructions under Implied Transformation Costs , 1997 .

[46]  Seán G. Brady,et al.  Phylogenomic Insights into the Evolution of Stinging Wasps and the Origins of Ants and Bees , 2017, Current Biology.

[47]  Ernest K. Lee,et al.  Extracting phylogenetic signal and accounting for bias in whole-genome data sets supports the Ctenophora as sister to remaining Metazoa , 2015, BMC Genomics.

[48]  P. Goloboff Extended implied weighting , 2014, Cladistics : the international journal of the Willi Hennig Society.

[49]  Alexandros Stamatakis,et al.  Novel information theory-based measures for quantifying incongruence among phylogenetic trees. , 2014, Molecular biology and evolution.

[50]  A. von Haeseler,et al.  MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation , 2018, BMC Evolutionary Biology.

[51]  Bengt Oxelman,et al.  Improvements to resampling measures of group support , 2003 .

[52]  Karen Meusemann,et al.  Phylogenomic analysis of Calyptratae: resolving the phylogenetic relationships within a major radiation of Diptera , 2019, Cladistics : the international journal of the Willi Hennig Society.

[53]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.

[54]  Pablo A. Goloboff,et al.  GB‐to‐TNT: facilitating creation of matrices from GenBank and diagnosis of results in TNT , 2012, Cladistics : the international journal of the Willi Hennig Society.

[55]  P. Goloboff,et al.  Weighting against homoplasy improves phylogenetic analysis of morphological data sets , 2008 .

[56]  Jeffrey P. Townsend,et al.  A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing , 2016, Nature.

[57]  J. Doyle The Irrelevance of Allele Tree Topologies for Species Delimitation, and a Non-Topological Alternative , 1995 .

[58]  Tandy J. Warnow,et al.  Rec-I-DCM3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Trees , 2004, IEEE Computer Society Computational Systems Bioinformatics Conference.

[59]  Minh Anh Nguyen,et al.  Ultrafast Approximation for Phylogenetic Bootstrap , 2013, Molecular biology and evolution.

[60]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[61]  J. Farris ESTIMATION OF CONSERVATISM OF CHARACTERS BY CONSTANCY WITHIN BIOLOGICAL POPULATIONS , 1966, Evolution; international journal of organic evolution.

[62]  Frederick A Matsen,et al.  19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology. , 2018, Systematic biology.

[63]  Ming-Yang Kao,et al.  Phylogeny Reconstruction , 2008, Encyclopedia of Algorithms.

[64]  M. Braun,et al.  Why Do Phylogenomic Data Sets Yield Conflicting Trees? Data Type Influences the Avian Tree of Life more than Taxon Sampling , 2017, Systematic biology.

[65]  David Sankoff,et al.  Locating the vertices of a steiner tree in an arbitrary metric space , 1975, Math. Program..

[66]  P. Goloboff,et al.  TNT version 1.5, including a full implementation of phylogenetic morphometrics , 2016, Cladistics : the international journal of the Willi Hennig Society.

[67]  Pablo A. Goloboff,et al.  Parsimony analysis of phylogenomic datasets (II): evaluation of PAUP*, MEGA and MPBoot , 2021, Cladistics : the international journal of the Willi Hennig Society.

[68]  P. Goloboff,et al.  Identifying unstable taxa: Efficient implementation of triplet-based measures of stability, and comparison with Phyutility and RogueNaRok. , 2015, Molecular phylogenetics and evolution.

[69]  Scott V. Edwards,et al.  Phylogenomic subsampling: a brief review , 2016 .

[70]  Nathan A. Johnson,et al.  Comparative phylogenomics reveal complex evolution of life history strategies in a clade of bivalves with parasitic larvae (Bivalvia: Unionoida: Ambleminae) , 2020, Cladistics : the international journal of the Willi Hennig Society.

[71]  M. Simmons A confounding effect of missing data on character conflict in maximum likelihood and Bayesian MCMC phylogenetic analyses. , 2014, Molecular phylogenetics and evolution.

[72]  Pablo A. Goloboff,et al.  TNT, a free program for phylogenetic analysis , 2008 .

[73]  J. Farris On Comparing the Shapes of Taxonomic Trees , 1973 .

[74]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[75]  Tamir Tuller,et al.  Maximum likelihood of evolutionary trees: hardness and approximation , 2005, ISMB.

[76]  G. Giribet,et al.  Exploring Phylogenetic Relationships within Myriapoda and the Effects of Matrix Composition and Occupancy on Phylogenomic Reconstruction , 2016, Systematic biology.

[77]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[78]  P. Goloboff,et al.  Likelihood approximations of implied weights parsimony can be selected over the Mk model by the Akaike information criterion , 2019, Cladistics : the international journal of the Willi Hennig Society.

[79]  Pablo A. Goloboff,et al.  Calculating SPR distances between trees , 2008, Cladistics : the international journal of the Willi Hennig Society.

[80]  P. Goloboff Oblong, a program to analyse phylogenomic data sets with millions of characters, requiring negligible amounts of RAM , 2014, Cladistics : the international journal of the Willi Hennig Society.

[81]  A. Young,et al.  Phylogenomics — principles, opportunities and pitfalls of big‐data phylogenetics , 2019, Systematic Entomology.

[82]  J. Townsend,et al.  A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing , 2015, Nature.

[83]  T. O’hara,et al.  Restructuring higher taxonomy using broad-scale phylogenomics: The living Ophiuroidea. , 2017, Molecular phylogenetics and evolution.

[84]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[85]  Q. Dai,et al.  Using Genes as Characters and a Parsimony Analysis to Explore the Phylogenetic Position of Turtles , 2013, PloS one.

[86]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[87]  Bryan Kolaczkowski,et al.  Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous , 2004, Nature.

[88]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[89]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[90]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.