SuperTriplets: a triplet-based supertree approach to phylogenomics

Motivation: Phylogenetic tree-building methods use molecular data to represent the evolutionary history of genes and taxa. A recurrent problem is to reconcile the various phylogenies built from different genomic sequences into a single one. This task is generally conducted by a two-step approach whereby a binary representation of the initial trees is first inferred and then a maximum parsimony (MP) analysis is performed on it. This binary representation uses a decomposition of all source trees that is usually based on clades, but that can also be based on triplets or quartets. The relative performances of these representations have been discussed but are difficult to assess since both are limited to relatively small datasets. Results: This article focuses on the triplet-based representation of source trees. We first recall how, using this representation, the parsimony analysis is related to the median tree notion. We then introduce SuperTriplets, a new algorithm that is specially designed to optimize this alternative formulation of the MP criterion. The method avoids several practical limitations of the triplet-based binary matrix representation, making it useful to deal with large datasets. When the correct resolution of every triplet appears more often than the incorrect ones in source trees, SuperTriplets warrants to reconstruct the correct phylogeny. Both simulations and a case study on mammalian phylogenomics confirm the advantages of this approach. In both cases, SuperTriplets tends to propose less resolved but more reliable supertrees than those inferred using Matrix Representation with Parsimony. Availability: Online and JAVA standalone versions of SuperTriplets are available at http://www.supertriplets.univ-montp2.fr/ Contact: vincent.ranwez@univ-montp2.fr

[1]  M. Ragan Phylogenetic inference based on matrix representation of trees. , 1992, Molecular phylogenetics and evolution.

[2]  David Bryant,et al.  A classification of consensus methods for phylogenetics , 2001, Bioconsensus.

[3]  Motoo Kimura,et al.  On the stochastic model for estimation of mutational distance between homologous proteins , 1972, Journal of Molecular Evolution.

[4]  E. Harding The probabilities of rooted tree-shapes generated by random bifurcation , 1971, Advances in Applied Probability.

[5]  Michael J. Sanderson,et al.  R8s: Inferring Absolute Rates of Molecular Evolution, Divergence times in the Absence of a Molecular Clock , 2003, Bioinform..

[6]  W. Doolittle,et al.  Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. , 2003, Molecular biology and evolution.

[7]  David M. Williams,et al.  Component coding, three-item coding, and consensus methods. , 2003, Systematic biology.

[8]  Oliver Eulenstein,et al.  Triplet supertree heuristics for the tree of life , 2009, BMC Bioinformatics.

[9]  David M. Williams Supertrees, Components and Three-Item Data , 2004 .

[10]  Cynthia A. Phillips,et al.  The Asymmetric Median Tree - A New Model for Building Consensus Trees , 1996, Discret. Appl. Math..

[11]  Mike Steel,et al.  Closure operations in phylogenetics. , 2007, Mathematical biosciences.

[12]  A. Dobson Comparing the shapes of trees , 1975 .

[13]  Mark Wilkinson,et al.  THREE‐TAXON STATEMENTS: WHEN IS A PARSIMONY ANALYSIS ALSO A CLIQUE ANALYSIS? , 1994 .

[14]  Mike Steel,et al.  Maximum likelihood supertrees. , 2007, Systematic biology.

[15]  Andy Purvis,et al.  A higher-level MRP supertree of placental mammals , 2006, BMC Evolutionary Biology.

[16]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[17]  James O. McInerney,et al.  Clann: investigating phylogenetic information through supertree analyses , 2005, Bioinform..

[18]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[19]  B. Baum Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees , 1992 .

[20]  Mark Wilkinson,et al.  Discriminating supported and unsupported relationships in supertrees using triplets. , 2006, Systematic biology.

[21]  Olivier Gascuel,et al.  SDM: a fast distance-based approach for (super) tree building in phylogenomics. , 2006, Systematic biology.

[22]  G. Yule,et al.  A Mathematical Theory of Evolution Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[23]  Tom H. Pringle,et al.  Molecular and Genomic Data Identify the Closest Living Relative of Primates , 2007, Science.

[24]  A. Mood,et al.  The statistical sign test. , 1946, Journal of the American Statistical Association.

[25]  Oliver Eulenstein,et al.  The shape of supertrees to come: tree shape related properties of fourteen supertree methods. , 2005, Systematic biology.

[26]  Mark Wilkinson,et al.  Measuring support and finding unsupported relationships in supertrees. , 2005, Systematic biology.

[27]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[28]  Tomáš Scholz D.T.J. Littlewood, R.A. Bray (Eds.): Interrelationships of the Platyhelminthes. , 2001 .

[29]  Joseph L. Thorley,et al.  Cladistic Information, Leaf Stability And Supertree Construction , 2000 .

[30]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[31]  J. Doyle,et al.  Gene Trees and Species Trees: Molecular Systematics as One-Character Taxonomy , 1992 .

[32]  Arnold G. Kluge,et al.  A Numerical Approach to Phylogenetic Systematics , 1970 .

[33]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[34]  Mark Wilkinson,et al.  Majority-rule supertrees. , 2007, Systematic biology.

[35]  Roderic D. M. Page,et al.  Modified Mincut Supertrees , 2002, WABI.

[36]  Tal Pupko,et al.  Rodent phylogeny revised: analysis of six nuclear genes from all major rodent clades , 2009, BMC Evolutionary Biology.

[37]  Frédéric Delsuc,et al.  OrthoMaM: A database of orthologous genomic markers for placental mammal phylogenetics , 2007, BMC Evolutionary Biology.

[38]  O. Bininda-Emonds Phylogenetic Supertrees: Combining Information To Reveal The Tree Of Life , 2004 .

[39]  Jianrong Dong,et al.  Comparing and aggregating partially resolved trees , 2008, Theor. Comput. Sci..

[40]  Douglas E. Critchlow,et al.  THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES , 1996 .

[41]  David Fernández-Baca,et al.  Performance of flip supertree construction with a heuristic algorithm. , 2004, Systematic biology.

[42]  D. Littlewood,et al.  Interrelationships of the Platyhelminthes , 2001 .

[43]  Jürgen Brosius,et al.  Mosaic retroposon insertion patterns in placental mammals. , 2009, Genome research.

[44]  Kate E. Jones,et al.  The delayed rise of present-day mammals , 1990, Nature.

[45]  J. G. Burleigh,et al.  Supertree bootstrapping methods for assessing phylogenetic variation among genes in genome-scale data sets. , 2006, Systematic biology.

[46]  F. Delsuc,et al.  Phylogenomics: the beginning of incongruence? , 2006, Trends in genetics : TIG.

[47]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[48]  Eric D. Green,et al.  Confirming the Phylogeny of Mammals by Use of Large Comparative Sequence Data Sets , 2008, Molecular biology and evolution.

[49]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[50]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .

[51]  David M. Williams,et al.  Models in phylogeny reconstruction , 1994 .

[52]  Pablo A. Goloboff,et al.  TNT, a free program for phylogenetic analysis , 2008 .

[53]  A. D. Gordon Consensus supertrees: The synthesis of rooted trees containing overlapping sets of labeled leaves , 1986 .

[54]  Christian N. S. Pedersen,et al.  Triplet Supertrees by , 2005 .

[55]  D. Bryant Building trees, hunting for trees, and comparing trees : theory and methods in phylogenetic analysis , 1997 .

[56]  O. Bininda-Emonds,et al.  Novel versus unsupported clades: assessing the qualitative support for clades in MRP supertrees. , 2003, Systematic biology.

[57]  François-Joseph Lapointe,et al.  Properties of supertree methods in the consensus setting. , 2007, Systematic biology.

[58]  M. Donoghue,et al.  Increasing data transparency and estimating phylogenetic uncertainty in supertrees: Approaches using nonparametric bootstrapping. , 2006, Systematic biology.

[59]  Bernard M. E. Moret,et al.  Phylogenetic Inference , 2011, Encyclopedia of Parallel Computing.

[60]  Glenn Hickey,et al.  SPR Distance Computation for Unrooted Trees , 2008, Evolutionary bioinformatics online.

[61]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[62]  Satish Rao,et al.  Using Semi-definite Programming to Enhance Supertree Resolvability , 2005, WABI.

[63]  W. Maddison RECONSTRUCTING CHARACTER EVOLUTION ON POLYTOMOUS CLADOGRAMS , 1989, Cladistics : the international journal of the Willi Hennig Society.

[64]  Mark Wilkinson,et al.  The information content of trees and their matrix representations. , 2004, Systematic biology.

[65]  E. N. Adams Consensus Techniques and the Comparison of Taxonomic Trees , 1972 .