A comparison of supermatrix and supertree methods for multilocus phylogenetics using organismal datasets

It has been proposed that supertree approaches should be applied to large multilocus datasets to achieve computational tractability. Large datasets such as those derived from phylogenomics studies can be broken into many locus‐specific tree searches and the resulting trees can be stitched together via a supertree method. Using simulated data, workers have reported that they can rapidly construct a supertree that is comparable to the results of heuristic tree search on the entire dataset. To test this assertion with organismal data, we compare tree length under the parsimony criterion and computational time for 20 multilocus datasets using supertree (SuperFine and SuperTriplets) and supermatrix (heuristic search in TNT) approaches. Tree length and computational times were compared among methods using the Wilcoxon matched‐pairs signed rank test. Supermatrix searches produced significantly shorter trees than either supertree approach (SuperFine or SuperTriplets; P < 0.0002 in both cases). Moreover, the processing time of supermatrix search was significantly lower than SuperFine+locus‐specific search (P < 0.01) but roughly equivalent to that of SuperTriplets+locus‐specific search (P > 0.4, not significant). In conclusion, we show by using real rather than simulated data that there is no basis, either in time tractability or in tree length, for use of supertrees over heuristic tree search using a supermatrix for phylogenomics.

[1]  Peter C Wainwright,et al.  The evolution of pharyngognathy: a phylogenetic and functional appraisal of the pharyngeal jaw key innovation in labroid fishes and beyond. , 2012, Systematic biology.

[2]  S. Renner,et al.  Brunfelsia (Solanaceae): a genus evenly divided between South America and radiations on Cuba and other Antillean islands. , 2012, Molecular phylogenetics and evolution.

[3]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[4]  D. Middleton,et al.  PHYLOGENY OF APOCYNOIDEAE AND THE APSA CLADE (APOCYNACEAE S.L.)1 , 2007 .

[5]  Srinivas Aluru,et al.  Large-scale phylogenetic analysis on current HPC architectures , 2008 .

[6]  D. Hillis Inferring complex phylogenies. , 1996, Nature.

[7]  R. Jansen,et al.  Molecular systematics of the neotropical genus Psiguria (Cucurbitaceae): Implications for phylogeny and species identification. , 2010, American journal of botany.

[8]  H. Shaffer,et al.  Assessing what is needed to resolve a molecular phylogeny: simulations and empirical data from emydid turtles , 2009, BMC Evolutionary Biology.

[9]  Pablo A. Goloboff,et al.  TNT, a free program for phylogenetic analysis , 2008 .

[10]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[11]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.

[12]  Andy Purvis,et al.  Phylogenetic supertrees: Assembling the trees of life. , 1998, Trends in ecology & evolution.

[13]  D. Wake,et al.  Morphological homoplasy, life history evolution, and historical biogeography of plethodontid salamanders inferred from complete mitochondrial genomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Stephen J O'Brien,et al.  The adequacy of morphology for reconstructing the early history of placental mammals. , 2007, Systematic biology.

[15]  S. Renner,et al.  Distribution models and a dated phylogeny for Chilean Oxalis species reveal occupation of new habitats by different lineages, not rapid adaptive radiation. , 2012, Systematic biology.

[16]  A. Kluge A Concern for Evidence and a Phylogenetic Hypothesis of Relationships among Epicrates (Boidae, Serpentes) , 1989 .

[17]  M. Ragan Phylogenetic inference based on matrix representation of trees. , 1992, Molecular phylogenetics and evolution.

[18]  H. Philippe,et al.  Serine codon-usage bias in deep phylogenomics: pancrustacean relationships as a case study. , 2013, Systematic biology.

[19]  Justin Yifu Lin,et al.  Against the Consensus , 2013 .

[20]  Derrick J. Zwickl,et al.  Is sparse taxon sampling a problem for phylogenetic inference? , 2003, Systematic biology.

[21]  D A Janies,et al.  Efficiency of parallel direct optimization. , 2001, Cladistics : the international journal of the Willi Hennig Society.

[22]  Srinivas Aluru,et al.  Large-scale phylogenetic analysis on current HPC architectures , 2008, Sci. Program..

[23]  M. Hoy,et al.  First divergence time estimate of spiders, scorpions, mites and ticks (subphylum: Chelicerata) inferred from mitochondrial phylogeny , 2008, Experimental and Applied Acarology.

[24]  N. Saitou A Theoretical Study of the Underestimation of Branch Lengths by the Maximum Parsimony Principle , 1989 .

[25]  Mark A. Ragan,et al.  The MRP Method , 2004 .

[26]  J. Palmer,et al.  Relationships Among Phaseoloid Legumes Based on Sequences from Eight Chloroplast Regions , 2009 .

[27]  A. D. Gordon Consensus supertrees: The synthesis of rooted trees containing overlapping sets of labeled leaves , 1986 .

[28]  Arndt von Haeseler,et al.  Accuracy of phylogeny reconstruction methods combining overlapping gene data sets , 2010, Algorithms for Molecular Biology.

[29]  Satish Rao,et al.  Quartets MaxCut: A Divide and Conquer Quartets Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  onrad,et al.  Resolution of a Supertree / Supermatrix Paradox , 2002 .

[31]  D. Wake,et al.  Phylogeny and biogeography of the family Salamandridae (Amphibia: Caudata) inferred from complete mitochondrial genomes. , 2008, Molecular phylogenetics and evolution.

[32]  Derrick J. Zwickl,et al.  Increased taxon sampling greatly reduces phylogenetic error. , 2002, Systematic biology.

[33]  Vincent Ranwez,et al.  SuperTriplets: a triplet-based supertree approach to phylogenomics , 2010, Bioinform..

[34]  B. Rannala,et al.  Taxon sampling and the accuracy of large phylogenies. , 1998, Systematic biology.

[35]  Tandy Warnow,et al.  SuperFine: fast and accurate supertree estimation. , 2012, Systematic biology.

[36]  David C. Tank,et al.  Phylogeny and Phylogenetic Nomenclature of the Campanulidae Based on an Expanded Sample of Genes and Taxa , 2010 .

[37]  Tandy J. Warnow,et al.  MRL and SuperFine+MRL: new supertree methods , 2012, Algorithms for Molecular Biology.

[38]  S. Poe Sensitivity of phylogeny estimation to taxonomic sampling. , 1998, Systematic biology.

[39]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[40]  M. Springer,et al.  A Critique of Matrix Representation with Parsimony Supertrees , 2004 .

[41]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[42]  O. Bininda-Emonds Phylogenetic Supertrees: Combining Information To Reveal The Tree Of Life , 2004 .

[43]  R. Baker,et al.  Corroboration among Data Sets in Simultaneous Analysis: Hidden Support for Phylogenetic Relationships among Higher Level Artiodactyl Taxa , 1999, Cladistics : the international journal of the Willi Hennig Society.

[44]  D. Hillis Inferring complex phytogenies , 1996, Nature.

[45]  Kate E. Jones,et al.  The delayed rise of present-day mammals , 1990, Nature.

[46]  A. J. Crawford,et al.  The Great American Biotic Interchange in frogs: multiple and early colonization of Central America by the South American genus Pristimantis (Anura: Craugastoridae). , 2012, Molecular phylogenetics and evolution.

[47]  Tandy J. Warnow,et al.  A simulation study comparing supertree and combined analysis methods using SMIDGen , 2009, Algorithms for Molecular Biology.

[48]  J. Farris,et al.  Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groups , 2009, Cladistics : the international journal of the Willi Hennig Society.

[49]  ICHAEL,et al.  Assessment of the Accuracy of Matrix Representation with Parsimony Analysis Supertree Construction , 2001 .

[50]  John D. McVay,et al.  Phylogeographic analysis and environmental niche modeling of the plain-bellied watersnake (Nerodia erythrogaster) reveals low levels of genetic and ecological differentiation. , 2010, Molecular phylogenetics and evolution.

[51]  O. Bininda-Emonds,et al.  The future of supertrees : bridging the gap with supermatrices 1 , 2010 .

[52]  Bernard M. E. Moret,et al.  Performance of Supertree Methods on Various Data Set Decompositions , 2004 .

[53]  Derrick J. Zwickl,et al.  Increased taxon sampling is advantageous for phylogenetic inference. , 2002, Systematic biology.

[54]  Olivier Gascuel,et al.  SDM: a fast distance-based approach for (super) tree building in phylogenomics. , 2006, Systematic biology.

[55]  G. Bergstrom,et al.  Genetic and morphological evidence that Phoma sclerotioides, causal agent of brown root rot of alfalfa, is composed of a species complex. , 2011, Phytopathology.

[56]  Caitlin A. Kuczynski,et al.  Combining phylogenomics and fossils in higher-level squamate reptile phylogeny: molecular data change the placement of fossil taxa. , 2010, Systematic biology.

[57]  C. Davis,et al.  Phylogeny of the clusioid clade (Malpighiales): evidence from the plastid and mitochondrial genomes. , 2011, American journal of botany.

[58]  O. Bininda-Emonds,et al.  The evolution of supertrees. , 2004, Trends in ecology & evolution.